Sample records for mining gene expression

  1. Microarray data and gene expression statistics for Saccharomyces cerevisiae exposed to simulated asbestos mine drainage.

    PubMed

    Driscoll, Heather E; Murray, Janet M; English, Erika L; Hunter, Timothy C; Pivarski, Kara; Dolci, Elizabeth D

    2017-08-01

    Here we describe microarray expression data (raw and normalized), experimental metadata, and gene-level data with expression statistics from Saccharomyces cerevisiae exposed to simulated asbestos mine drainage from the Vermont Asbestos Group (VAG) Mine on Belvidere Mountain in northern Vermont, USA. For nearly 100 years (between the late 1890s and 1993), chrysotile asbestos fibers were extracted from serpentinized ultramafic rock at the VAG Mine for use in construction and manufacturing industries. Studies have shown that water courses and streambeds nearby have become contaminated with asbestos mine tailings runoff, including elevated levels of magnesium, nickel, chromium, and arsenic, elevated pH, and chrysotile asbestos-laden mine tailings, due to leaching and gradual erosion of massive piles of mine waste covering approximately 9 km 2 . We exposed yeast to simulated VAG Mine tailings leachate to help gain insight on how eukaryotic cells exposed to VAG Mine drainage may respond in the mine environment. Affymetrix GeneChip® Yeast Genome 2.0 Arrays were utilized to assess gene expression after 24-h exposure to simulated VAG Mine tailings runoff. The chemistry of mine-tailings leachate, mine-tailings leachate plus yeast extract peptone dextrose media, and control yeast extract peptone dextrose media is also reported. To our knowledge this is the first dataset to assess global gene expression patterns in a eukaryotic model system simulating asbestos mine tailings runoff exposure. Raw and normalized gene expression data are accessible through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) Database Series GSE89875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89875).

  2. Online Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases

    PubMed Central

    2005-01-01

    Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB. PMID:16046824

  3. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    PubMed

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  4. An Improved Pearson's Correlation Proximity-Based Hierarchical Clustering for Mining Biological Association between Genes

    PubMed Central

    Booma, P. M.; Prabhakaran, S.; Dhanalakshmi, R.

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality. PMID:25136661

  5. Integrated pathway-based transcription regulation network mining and visualization based on gene expression profiles.

    PubMed

    Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko

    2016-06-01

    Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    PubMed

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  7. MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways

    PubMed Central

    Koumakis, Lefteris; Kartsaki, Evgenia; Chatzimina, Maria; Zervakis, Michalis; Vassou, Despoina; Marias, Kostas; Moustakis, Vassilis; Potamias, George

    2016-01-01

    Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers’ exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes. PMID:27832067

  8. MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways.

    PubMed

    Koumakis, Lefteris; Kanterakis, Alexandros; Kartsaki, Evgenia; Chatzimina, Maria; Zervakis, Michalis; Tsiknakis, Manolis; Vassou, Despoina; Kafetzopoulos, Dimitris; Marias, Kostas; Moustakis, Vassilis; Potamias, George

    2016-11-01

    Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers' exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes.

  9. Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series.

    PubMed

    Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina

    2015-01-01

    Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.

  10. Function Clustering Self-Organization Maps (FCSOMs) for mining differentially expressed genes in Drosophila and its correlation with the growth medium.

    PubMed

    Liu, L L; Liu, M J; Ma, M

    2015-09-28

    The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.

  11. RANWAR: rank-based weighted association rule mining from gene expression and methylation data.

    PubMed

    Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2015-01-01

    Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.

  12. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

  13. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

  14. GExplore: a web server for integrated queries of protein domains, gene expression and mutant phenotypes

    PubMed Central

    2009-01-01

    Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126

  15. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins, borrelidin, two novel lipopeptides, and one unknown antibiotic from Streptomyces rochei Sal35. The transfer, expression, and screening of the library were all performed in a high-throughput way, so that this approach is scalable and adaptable to industrial automation for next-generation antibiotic discovery. PMID:27451447

  16. Text Mining in Cancer Gene and Pathway Prioritization

    PubMed Central

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes. PMID:25392685

  17. Text mining in cancer gene and pathway prioritization.

    PubMed

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

  18. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction.

    PubMed

    Yu, Yao; Tu, Kang; Zheng, Siyuan; Li, Yun; Ding, Guohui; Ping, Jie; Hao, Pei; Li, Yixue

    2009-08-25

    In the post-genomic era, the development of high-throughput gene expression detection technology provides huge amounts of experimental data, which challenges the traditional pipelines for data processing and analyzing in scientific researches. In our work, we integrated gene expression information from Gene Expression Omnibus (GEO), biomedical ontology from Medical Subject Headings (MeSH) and signaling pathway knowledge from sigPathway entries to develop a context mining tool for gene expression analysis - GEOGLE. GEOGLE offers a rapid and convenient way for searching relevant experimental datasets, pathways and biological terms according to multiple types of queries: including biomedical vocabularies, GDS IDs, gene IDs, pathway names and signature list. Moreover, GEOGLE summarizes the signature genes from a subset of GDSes and estimates the correlation between gene expression and the phenotypic distinction with an integrated p value. This approach performing global searching of expression data may expand the traditional way of collecting heterogeneous gene expression experiment data. GEOGLE is a novel tool that provides researchers a quantitative way to understand the correlation between gene expression and phenotypic distinction through meta-analysis of gene expression datasets from different experiments, as well as the biological meaning behind. The web site and user guide of GEOGLE are available at: http://omics.biosino.org:14000/kweb/workflow.jsp?id=00020.

  19. MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures.

    PubMed

    Vazquez, Miguel; Nogales-Cadenas, Ruben; Arroyo, Javier; Botías, Pedro; García, Raul; Carazo, Jose M; Tirado, Francisco; Pascual-Montano, Alberto; Carmona-Saez, Pedro

    2010-07-01

    The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.

  20. Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets.

    PubMed

    Salem, Saeed; Ozcaglar, Cagri

    2014-01-01

    Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression. We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

  1. Clustering Algorithms: Their Application to Gene Expression Data

    PubMed Central

    Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

    2016-01-01

    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867

  2. Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets

    PubMed Central

    2014-01-01

    Background Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression. Results We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways. PMID:25221624

  3. From data towards knowledge: revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data.

    PubMed

    Lu, Songjian; Jin, Bo; Cowart, L Ashley; Lu, Xinghua

    2013-01-01

    Genetic and pharmacological perturbation experiments, such as deleting a gene and monitoring gene expression responses, are powerful tools for studying cellular signal transduction pathways. However, it remains a challenge to automatically derive knowledge of a cellular signaling system at a conceptual level from systematic perturbation-response data. In this study, we explored a framework that unifies knowledge mining and data mining towards the goal. The framework consists of the following automated processes: 1) applying an ontology-driven knowledge mining approach to identify functional modules among the genes responding to a perturbation in order to reveal potential signals affected by the perturbation; 2) applying a graph-based data mining approach to search for perturbations that affect a common signal; and 3) revealing the architecture of a signaling system by organizing signaling units into a hierarchy based on their relationships. Applying this framework to a compendium of yeast perturbation-response data, we have successfully recovered many well-known signal transduction pathways; in addition, our analysis has led to many new hypotheses regarding the yeast signal transduction system; finally, our analysis automatically organized perturbed genes as a graph reflecting the architecture of the yeast signaling system. Importantly, this framework transformed molecular findings from a gene level to a conceptual level, which can be readily translated into computable knowledge in the form of rules regarding the yeast signaling system, such as "if genes involved in the MAPK signaling are perturbed, genes involved in pheromone responses will be differentially expressed."

  4. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  5. Genotoxic effects and gene expression in Danio rerio (Hamilton 1822) (Cypriniformes: Cyprinidae) exposed to mining-impacted tributaries in Manizales, Colombia.

    PubMed

    Ossa-López, Paula A; Castaño-Villa, Gabriel J; Rivera-Páez, Fredy A

    2017-09-25

    The zebrafish (Danio rerio) is one of the most studied aquatic organisms for water biomonitoring, due to its sensitivity to environmental degradation and resistance to toxic substances. This study determined the presence of micronuclei and nuclear abnormalities in peripheral blood erythrocytes, and assessed the gene expression of caspase-3 (CASP-3) and metallothionein 1 (MT-1) in the gills and liver of D. rerio. The study fish (n = 45) were exposed to water collected from two stations with mining impact (E2 and E3) and a reference station without evident mining contamination (E1), all located in La Elvira stream (Manizales-Colombia). In addition, a positive control (PC) with HgCl 2 (50 μg/L) and negative control (NC) with tap water were included. The fish from the PC and E2 and E3 treatments displayed genotoxic effects and changes in gene expression, with significant differences in micronuclei formation and the presence of blebbed nuclei. The cytochrome oxidase subunit I (COI) gene was used as reference and proved to be stable compared to the β-actin and 28S ribosomal RNA (28S) genes. In gills, CASP-3 expression was higher in the PC, and MT-1 expression was higher in the PC and E3 treatment. In liver, CASP-3 was expressed in the E2 treatment, and MT-1 expression was low. These results show that the genotoxic effects and differential gene expression observed in fish exposed to water from La Elvira stream could also be affecting the organisms present in this habitat.

  6. DEXTER: Disease-Expression Relation Extraction from Text.

    PubMed

    Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K

    2018-01-01

    Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.

  7. Heavy metals in wild house mice from coal-mining areas of Colombia and expression of genes related to oxidative stress, DNA damage and exposure to metals.

    PubMed

    Guerrero-Castilla, Angélica; Olivero-Verbel, Jesús; Marrugo-Negrete, José

    2014-03-01

    Coal mining is a source of pollutants that impact on environmental and human health. This study examined the metal content and the transcriptional status of gene markers associated with oxidative stress, metal transport and DNA damage in livers of feral mice collected near coal-mining operations, in comparison with mice obtained from a reference site. Mus musculus specimens were caught from La Loma and La Jagua, two coal-mining sites in the north of Colombia, as well as from Valledupar (Cesar Department), a city located 100km north of the mines. Concentrations in liver tissue of Hg, Zn, Pb, Cd, Cu and As were determined by differential stripping voltammetry, and real-time PCR was used to measure gene expression. Compared with the reference group (Valledupar), hepatic concentrations of Cd, Cu and Zn were significantly higher in animals living near mining areas. In exposed animals, the mRNA expression of NQ01, MT1, SOD1, MT2, and DDIT3 was 4.2-, 7.3-, 2.5-, 4.6- and 3.4-fold greater in coal mining sites, respectively, than in animals from the reference site (p<0.05). These results suggest that activities related to coal mining may generate pollutants that could affect the biota, inducing the transcription of biochemical markers related to oxidative stress, metal exposure, and DNA damage. These changes may be in part linked to metal toxicity, and could have implications for the development of chronic disease. Therefore, it is essential to implement preventive measures to minimize the effects of coal mining on its nearby environment, in order to protect human health. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  9. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additionalmore » genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.« less

  10. Mimosa: Mixture Model of Co-expression to Detect Modulators of Regulatory Interaction

    NASA Astrophysics Data System (ADS)

    Hansen, Matthew; Everett, Logan; Singh, Larry; Hannenhalli, Sridhar

    Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation. Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in a (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on three biological cases in cow and in yeast. While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.

  11. ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.

    PubMed

    Krishnakumar, Vivek; Contrino, Sergio; Cheng, Chia-Yi; Belyaeva, Irina; Ferlanti, Erik S; Miller, Jason R; Vaughn, Matthew W; Micklem, Gos; Town, Christopher D; Chan, Agnes P

    2017-01-01

    ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  12. Mining subspace clusters from DNA microarray data using large itemset techniques.

    PubMed

    Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi

    2009-05-01

    Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.

  13. NCBI GEO: mining millions of expression profiles--database and tools.

    PubMed

    Barrett, Tanya; Suzek, Tugba O; Troup, Dennis B; Wilhite, Stephen E; Ngau, Wing-Chi; Ledoux, Pierre; Rudnev, Dmitry; Lash, Alex E; Fujibuchi, Wataru; Edgar, Ron

    2005-01-01

    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30,000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

  14. Mining the archives: a cross-platform analysis of gene expression profiles in archival formalin-fixed paraffin-embedded (FFPE) tissue.

    EPA Science Inventory

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation...

  15. Microarray Data Mining for Potential Selenium Targets in Chemoprevention of Prostate Cancer

    PubMed Central

    ZHANG, HAITAO; DONG, YAN; ZHAO, HONGJUAN; BROOKS, JAMES D.; HAWTHORN, LESLEYANN; NOWAK, NORMA; MARSHALL, JAMES R.; GAO, ALLEN C.; IP, CLEMENT

    2008-01-01

    Background A previous clinical trial showed that selenium supplementation significantly reduced the incidence of prostate cancer. We report here a bioinformatics approach to gain new insights into selenium molecular targets that might be relevant to prostate cancer chemoprevention. Materials and Methods We first performed data mining analysis to identify genes which are consistently dysregulated in prostate cancer using published datasets from gene expression profiling of clinical prostate specimens. We then devised a method to systematically analyze three selenium microarray datasets from the LNCaP human prostate cancer cells, and to match the analysis to the cohort of genes implicated in prostate carcinogenesis. Moreover, we compared the selenium datasets with two datasets obtained from expression profiling of androgen-stimulated LNCaP cells. Results We found that selenium reverses the expression of genes implicated in prostate carcinogenesis. In addition, we found that selenium could counteract the effect of androgen on the expression of a subset obtained from androgen-regulated genes. Conclusions The above information provides us with a treasure of new clues to investigate the mechanism of selenium chemoprevention of prostate cancer. Furthermore, these selenium target genes could also serve as biomarkers in future clinical trials to gauge the efficacy of selenium intervention. PMID:18548127

  16. Discovery of error-tolerant biclusters from noisy gene expression data.

    PubMed

    Gupta, Rohit; Rao, Navneet; Kumar, Vipin

    2011-11-24

    An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.

  17. Mining microarray datasets in nutrition: expression of the GPR120 (n-3 fatty acid receptor/sensor) gene is down-regulated in human adipocytes by macrophage secretions.

    PubMed

    Trayhurn, Paul; Denyer, Gareth

    2012-01-01

    Microarray datasets are a rich source of information in nutritional investigation. Targeted mining of microarray data following initial, non-biased bioinformatic analysis can provide key insight into specific genes and metabolic processes of interest. Microarrays from human adipocytes were examined to explore the effects of macrophage secretions on the expression of the G-protein-coupled receptor (GPR) genes that encode fatty acid receptors/sensors. Exposure of the adipocytes to macrophage-conditioned medium for 4 or 24 h had no effect on GPR40 and GPR43 expression, but there was a marked stimulation of GPR84 expression (receptor for medium-chain fatty acids), the mRNA level increasing 13·5-fold at 24 h relative to unconditioned medium. Importantly, expression of GPR120, which encodes an n-3 PUFA receptor/sensor, was strongly inhibited by the conditioned medium (15-fold decrease in mRNA at 24 h). Macrophage secretions have major effects on the expression of fatty acid receptor/sensor genes in human adipocytes, which may lead to an augmentation of the inflammatory response in adipose tissue in obesity.

  18. Mining microarray datasets in nutrition: expression of the GPR120 (n-3 fatty acid receptor/sensor) gene is down-regulated in human adipocytes by macrophage secretions

    PubMed Central

    Trayhurn, Paul; Denyer, Gareth

    2012-01-01

    Microarray datasets are a rich source of information in nutritional investigation. Targeted mining of microarray data following initial, non-biased bioinformatic analysis can provide key insight into specific genes and metabolic processes of interest. Microarrays from human adipocytes were examined to explore the effects of macrophage secretions on the expression of the G-protein-coupled receptor (GPR) genes that encode fatty acid receptors/sensors. Exposure of the adipocytes to macrophage-conditioned medium for 4 or 24 h had no effect on GPR40 and GPR43 expression, but there was a marked stimulation of GPR84 expression (receptor for medium-chain fatty acids), the mRNA level increasing 13·5-fold at 24 h relative to unconditioned medium. Importantly, expression of GPR120, which encodes an n-3 PUFA receptor/sensor, was strongly inhibited by the conditioned medium (15-fold decrease in mRNA at 24 h). Macrophage secretions have major effects on the expression of fatty acid receptor/sensor genes in human adipocytes, which may lead to an augmentation of the inflammatory response in adipose tissue in obesity. PMID:25191551

  19. GeneChip Expression Profiling Reveals the Alterations of Energy Metabolism Related Genes in Osteocytes under Large Gradient High Magnetic Fields

    PubMed Central

    Wang, Yang; Chen, Zhi-Hao; Yin, Chun; Ma, Jian-Hua; Li, Di-Jie; Zhao, Fan; Sun, Yu-Long; Hu, Li-Fang; Shang, Peng; Qian, Ai-Rong

    2015-01-01

    The diamagnetic levitation as a novel ground-based model for simulating a reduced gravity environment has recently been applied in life science research. In this study a specially designed superconducting magnet with a large gradient high magnetic field (LG-HMF), which can provide three apparent gravity levels (μ-g, 1-g, and 2-g), was used to simulate a space-like gravity environment. Osteocyte, as the most important mechanosensor in bone, takes a pivotal position in mediating the mechano-induced bone remodeling. In this study, the effects of LG-HMF on gene expression profiling of osteocyte-like cell line MLO-Y4 were investigated by Affymetrix DNA microarray. LG-HMF affected osteocyte gene expression profiling. Differentially expressed genes (DEGs) and data mining were further analyzed by using bioinfomatic tools, such as DAVID, iReport. 12 energy metabolism related genes (PFKL, AK4, ALDOC, COX7A1, STC1, ADM, CA9, CA12, P4HA1, APLN, GPR35 and GPR84) were further confirmed by real-time PCR. An integrated gene interaction network of 12 DEGs was constructed. Bio-data mining showed that genes involved in glucose metabolic process and apoptosis changed notablly. Our results demostrated that LG-HMF affected the expression of energy metabolism related genes in osteocyte. The identification of sensitive genes to special environments may provide some potential targets for preventing and treating bone loss or osteoporosis. PMID:25635858

  20. GeneChip expression profiling reveals the alterations of energy metabolism related genes in osteocytes under large gradient high magnetic fields.

    PubMed

    Wang, Yang; Chen, Zhi-Hao; Yin, Chun; Ma, Jian-Hua; Li, Di-Jie; Zhao, Fan; Sun, Yu-Long; Hu, Li-Fang; Shang, Peng; Qian, Ai-Rong

    2015-01-01

    The diamagnetic levitation as a novel ground-based model for simulating a reduced gravity environment has recently been applied in life science research. In this study a specially designed superconducting magnet with a large gradient high magnetic field (LG-HMF), which can provide three apparent gravity levels (μ-g, 1-g, and 2-g), was used to simulate a space-like gravity environment. Osteocyte, as the most important mechanosensor in bone, takes a pivotal position in mediating the mechano-induced bone remodeling. In this study, the effects of LG-HMF on gene expression profiling of osteocyte-like cell line MLO-Y4 were investigated by Affymetrix DNA microarray. LG-HMF affected osteocyte gene expression profiling. Differentially expressed genes (DEGs) and data mining were further analyzed by using bioinfomatic tools, such as DAVID, iReport. 12 energy metabolism related genes (PFKL, AK4, ALDOC, COX7A1, STC1, ADM, CA9, CA12, P4HA1, APLN, GPR35 and GPR84) were further confirmed by real-time PCR. An integrated gene interaction network of 12 DEGs was constructed. Bio-data mining showed that genes involved in glucose metabolic process and apoptosis changed notablly. Our results demostrated that LG-HMF affected the expression of energy metabolism related genes in osteocyte. The identification of sensitive genes to special environments may provide some potential targets for preventing and treating bone loss or osteoporosis.

  1. Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression.

    PubMed

    Li, Shuyu; Li, Yiqun Helen; Wei, Tao; Su, Eric Wen; Duffin, Kevin; Liao, Birong

    2006-10-25

    The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues. There are 42-54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30-40% of genes whose expression patterns are positively correlated and 10-15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data. To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes.

  2. A systems-genetics approach and data mining tool to assist in the discovery of genes underlying complex traits in Oryza sativa.

    PubMed

    Ficklin, Stephen P; Feltus, Frank Alex

    2013-01-01

    Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.

  3. A Systems-Genetics Approach and Data Mining Tool to Assist in the Discovery of Genes Underlying Complex Traits in Oryza sativa

    PubMed Central

    Ficklin, Stephen P.; Feltus, Frank Alex

    2013-01-01

    Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance. PMID:23874666

  4. Mining microarray data at NCBI's Gene Expression Omnibus (GEO)*.

    PubMed

    Barrett, Tanya; Edgar, Ron

    2006-01-01

    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) has emerged as the leading fully public repository for gene expression data. This chapter describes how to use Web-based interfaces, applications, and graphics to effectively explore, visualize, and interpret the hundreds of microarray studies and millions of gene expression patterns stored in GEO. Data can be examined from both experiment-centric and gene-centric perspectives using user-friendly tools that do not require specialized expertise in microarray analysis or time-consuming download of massive data sets. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

  5. Quantifying Associations between Environmental Stressors and Demographic Factors

    EPA Science Inventory

    Association rule mining (ARM) [1-3], also known as frequent item set mining [4] or market basket analysis [1], has been widely applied in many different areas, such as business product portfolio planning [5], intrusion detection infrastructure design [6], gene expression analysis...

  6. bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses.

    PubMed

    Jézéquel, Pascal; Frénel, Jean-Sébastien; Campion, Loïc; Guérin-Charbonnel, Catherine; Gouraud, Wilfried; Ricolleau, Gabriel; Campone, Mario

    2013-01-01

    We recently developed a user-friendly web-based application called bc-GenExMiner (http://bcgenex.centregauducheau.fr), which offered the possibility to evaluate prognostic informativity of genes in breast cancer by means of a 'prognostic module'. In this study, we develop a new module called 'correlation module', which includes three kinds of gene expression correlation analyses. The first one computes correlation coefficient between 2 or more (up to 10) chosen genes. The second one produces two lists of genes that are most correlated (positively and negatively) to a 'tested' gene. A gene ontology (GO) mining function is also proposed to explore GO 'biological process', 'molecular function' and 'cellular component' terms enrichment for the output lists of most correlated genes. The third one explores gene expression correlation between the 15 telomeric and 15 centromeric genes surrounding a 'tested' gene. These correlation analyses can be performed in different groups of patients: all patients (without any subtyping), in molecular subtypes (basal-like, HER2+, luminal A and luminal B) and according to oestrogen receptor status. Validation tests based on published data showed that these automatized analyses lead to results consistent with studies' conclusions. In brief, this new module has been developed to help basic researchers explore molecular mechanisms of breast cancer. DATABASE URL: http://bcgenex.centregauducheau.fr

  7. CGO: utilizing and integrating gene expression microarray data in clinical research and data management.

    PubMed

    Bumm, Klaus; Zheng, Mingzhong; Bailey, Clyde; Zhan, Fenghuang; Chiriva-Internati, M; Eddlemon, Paul; Terry, Julian; Barlogie, Bart; Shaughnessy, John D

    2002-02-01

    Clinical GeneOrganizer (CGO) is a novel windows-based archiving, organization and data mining software for the integration of gene expression profiling in clinical medicine. The program implements various user-friendly tools and extracts data for further statistical analysis. This software was written for Affymetrix GeneChip *.txt files, but can also be used for any other microarray-derived data. The MS-SQL server version acts as a data mart and links microarray data with clinical parameters of any other existing database and therefore represents a valuable tool for combining gene expression analysis and clinical disease characteristics.

  8. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

    PubMed

    Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

    2012-04-04

    Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

  9. Public data mining plus domestic experimental study defined involvement of the old-yet-uncharacterized gene matrix-remodeling associated 7 (MXRA7) in physiopathology of the eye.

    PubMed

    Jia, Changkai; Zhang, Feng; Zhu, Ying; Qi, Xia; Wang, Yiqiang

    2017-10-20

    Matrix-remodeling associated 7 (MXRA7) gene was first reported in 2002 and named so for its co-expression with several genes known to relate with matrix-remodeling. However, not any studies had been intentionally performed to characterize this gene. We started defining the functions of MXRA7 by integrating bioinformatics analysis and experimental study. Data mining of MXRA7 expression in BioGPS, Gene Expression Omnibus and EurExpress platforms highlighted high level expression of Mxra7 in murine ocular tissues. Real-time PCR was employed to measure Mxra7 mRNA in tissues of adult C57BL/6 mice and demonstrated that Mxra7 was preferentially expressed at higher level in retina, corneas and lens than in other tissues. Then the inflammatory corneal neovascularization (CorNV) model and fungal corneal infections were induced in Balb/c mice, and mRNA levels of Mxra7 as well as several matrix-remodeling related genes (Mmp3, Mmp13, Ecm1, Timp1) were monitored with RT-PCR. The results demonstrated a time-dependent Mxra7 under-expression pattern (U-shape curve along timeline), while all other matrix-remodeling related genes manifested an opposite changes pattern (dome-shape curve). When limited data from BioGPS concerning human MXRA7 gene expression in human tissues were looked at, it was found that ocular tissue was also the one expressing highest level of MXRA7. To conclude, integrative assay of MXRA7 gene expression in public databank as well as domestic animal models revealed a selective high expression MXRA7 in murine and human ocular tissues, and its change patterns in two corneal disease models implied that MXRA7 might play a role in pathological processes or diseases involving injury, neovascularization and would healing. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Computational gene expression profiling under salt stress reveals patterns of co-expression

    PubMed Central

    Sanchita; Sharma, Ashok

    2016-01-01

    Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411

  11. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.

    PubMed

    Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

    2011-01-01

    Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.

  12. Mining Microarray Data at NCBI’s Gene Expression Omnibus (GEO)*

    PubMed Central

    Barrett, Tanya; Edgar, Ron

    2006-01-01

    Summary The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) has emerged as the leading fully public repository for gene expression data. This chapter describes how to use Web-based interfaces, applications, and graphics to effectively explore, visualize, and interpret the hundreds of microarray studies and millions of gene expression patterns stored in GEO. Data can be examined from both experiment-centric and gene-centric perspectives using user-friendly tools that do not require specialized expertise in microarray analysis or time-consuming download of massive data sets. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo. PMID:16888359

  13. Uses of antimicrobial genes from microbial genome

    DOEpatents

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  14. minepath.org: a free interactive pathway analysis web server.

    PubMed

    Koumakis, Lefteris; Roussos, Panos; Potamias, George

    2017-07-03

    ( www.minepath.org ) is a web-based platform that elaborates on, and radically extends the identification of differentially expressed sub-paths in molecular pathways. Besides the network topology, the underlying MinePath algorithmic processes exploit exact gene-gene molecular relationships (e.g. activation, inhibition) and are able to identify differentially expressed pathway parts. Each pathway is decomposed into all its constituent sub-paths, which in turn are matched with corresponding gene expression profiles. The highly ranked, and phenotype inclined sub-paths are kept. Apart from the pathway analysis algorithm, the fundamental innovation of the MinePath web-server concerns its advanced visualization and interactive capabilities. To our knowledge, this is the first pathway analysis server that introduces and offers visualization of the underlying and active pathway regulatory mechanisms instead of genes. Other features include live interaction, immediate visualization of functional sub-paths per phenotype and dynamic linked annotations for the engaged genes and molecular relations. The user can download not only the results but also the corresponding web viewer framework of the performed analysis. This feature provides the flexibility to immediately publish results without publishing source/expression data, and get all the functionality of a web based pathway analysis viewer. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. The differential expression of Chironomus spp genes as useful tools in the search for pollution biomarkers in freshwater ecosystems.

    PubMed

    Mantilla, Javier G; Gomes, Lucimar; Cristancho, Marco A

    2018-05-01

    Insects of the Chironomidae family are characterized by a wide ecological diversity in freshwater ecosystems. The larvae have the physiological potential to tolerate environmental stress even when there is a low concentration of oxygen, the presence of toxic substances or when there are changes in temperature and salinity. On the other hand, it is important to consider that at a cellular level, when individual insects are exposed to environmental changes, it induces responses of groups of genes that govern the molecular mechanisms related to such tolerance. In this review, using fourth instar larvae of Chironomus spp. in natural conditions and of Chironomus columbiensis under controlled conditions, we will discuss the genetic expression of a group of genes that respond to detoxification and also the biological functions involved and impacted on by mining stressors. The study of macroinvertebrate bioindicator species and their gene expression as a result of mining activity opens a window on the search for genetic biomarkers that could be used in environmental pollution assessments in freshwater ecosystems.

  16. An Integrative data mining approach to identifying Adverse ...

    EPA Pesticide Factsheets

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP

  17. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies.

    PubMed

    Rollins, Derrick K; Teh, Ailing

    2010-12-17

    Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

  18. Genexpi: a toolset for identifying regulons and validating gene regulatory networks using time-course expression data.

    PubMed

    Modrák, Martin; Vohradský, Jiří

    2018-04-13

    Identifying regulons of sigma factors is a vital subtask of gene network inference. Integrating multiple sources of data is essential for correct identification of regulons and complete gene regulatory networks. Time series of expression data measured with microarrays or RNA-seq combined with static binding experiments (e.g., ChIP-seq) or literature mining may be used for inference of sigma factor regulatory networks. We introduce Genexpi: a tool to identify sigma factors by combining candidates obtained from ChIP experiments or literature mining with time-course gene expression data. While Genexpi can be used to infer other types of regulatory interactions, it was designed and validated on real biological data from bacterial regulons. In this paper, we put primary focus on CyGenexpi: a plugin integrating Genexpi with the Cytoscape software for ease of use. As a part of this effort, a plugin for handling time series data in Cytoscape called CyDataseries has been developed and made available. Genexpi is also available as a standalone command line tool and an R package. Genexpi is a useful part of gene network inference toolbox. It provides meaningful information about the composition of regulons and delivers biologically interpretable results.

  19. Mining microbial metatranscriptomes for expression of antibiotic resistance genes under natural conditions.

    PubMed

    Versluis, Dennis; D'Andrea, Marco Maria; Ramiro Garcia, Javier; Leimena, Milkha M; Hugenholtz, Floor; Zhang, Jing; Öztürk, Başak; Nylund, Lotta; Sipkema, Detmer; van Schaik, Willem; de Vos, Willem M; Kleerebezem, Michiel; Smidt, Hauke; van Passel, Mark W J

    2015-07-08

    Antibiotic resistance genes are found in a broad range of ecological niches associated with complex microbiota. Here we investigated if resistance genes are not only present, but also transcribed under natural conditions. Furthermore, we examined the potential for antibiotic production by assessing the expression of associated secondary metabolite biosynthesis gene clusters. Metatranscriptome datasets from intestinal microbiota of four human adults, one human infant, 15 mice and six pigs, of which only the latter have received antibiotics prior to the study, as well as from sea bacterioplankton, a marine sponge, forest soil and sub-seafloor sediment, were investigated. We found that resistance genes are expressed in all studied ecological niches, albeit with niche-specific differences in relative expression levels and diversity of transcripts. For example, in mice and human infant microbiota predominantly tetracycline resistance genes were expressed while in human adult microbiota the spectrum of expressed genes was more diverse, and also included β-lactam, aminoglycoside and macrolide resistance genes. Resistance gene expression could result from the presence of natural antibiotics in the environment, although we could not link it to expression of corresponding secondary metabolites biosynthesis clusters. Alternatively, resistance gene expression could be constitutive, or these genes serve alternative roles besides antibiotic resistance.

  20. Mining microbial metatranscriptomes for expression of antibiotic resistance genes under natural conditions

    NASA Astrophysics Data System (ADS)

    Versluis, Dennis; D'Andrea, Marco Maria; Ramiro Garcia, Javier; Leimena, Milkha M.; Hugenholtz, Floor; Zhang, Jing; Öztürk, Başak; Nylund, Lotta; Sipkema, Detmer; Schaik, Willem Van; de Vos, Willem M.; Kleerebezem, Michiel; Smidt, Hauke; Passel, Mark W. J. Van

    2015-07-01

    Antibiotic resistance genes are found in a broad range of ecological niches associated with complex microbiota. Here we investigated if resistance genes are not only present, but also transcribed under natural conditions. Furthermore, we examined the potential for antibiotic production by assessing the expression of associated secondary metabolite biosynthesis gene clusters. Metatranscriptome datasets from intestinal microbiota of four human adults, one human infant, 15 mice and six pigs, of which only the latter have received antibiotics prior to the study, as well as from sea bacterioplankton, a marine sponge, forest soil and sub-seafloor sediment, were investigated. We found that resistance genes are expressed in all studied ecological niches, albeit with niche-specific differences in relative expression levels and diversity of transcripts. For example, in mice and human infant microbiota predominantly tetracycline resistance genes were expressed while in human adult microbiota the spectrum of expressed genes was more diverse, and also included β-lactam, aminoglycoside and macrolide resistance genes. Resistance gene expression could result from the presence of natural antibiotics in the environment, although we could not link it to expression of corresponding secondary metabolites biosynthesis clusters. Alternatively, resistance gene expression could be constitutive, or these genes serve alternative roles besides antibiotic resistance.

  1. miRTex: A Text Mining System for miRNA-Gene Relation Extraction

    PubMed Central

    Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.

    2015-01-01

    MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. PMID:26407127

  2. Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity.

    PubMed

    Zhang, J D; Berntenis, N; Roth, A; Ebeling, M

    2014-06-01

    Gene signatures of drug-induced toxicity are of broad interest, but they are often identified from small-scale, single-time point experiments, and are therefore of limited applicability. To address this issue, we performed multivariate analysis of gene expression, cell-based assays, and histopathological data in the TG-GATEs (Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system) database. Data mining highlights four genes-EGR1, ATF3, GDF15 and FGF21-that are induced 2 h after drug administration in human and rat primary hepatocytes poised to eventually undergo cytotoxicity-induced cell death. Modelling and simulation reveals that these early stress-response genes form a functional network with evolutionarily conserved structure and intrinsic dynamics. This is underlined by the fact that early induction of this network in vivo predicts drug-induced liver and kidney pathology with high accuracy. Our findings demonstrate the value of early gene-expression signatures in predicting and understanding compound-induced toxicity. The identified network can empower first-line tests that reduce animal use and costs of safety evaluation.

  3. Mining biological databases for candidate disease genes

    NASA Astrophysics Data System (ADS)

    Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

    2001-07-01

    The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

  4. Expression of immunoregulatory genes and its relationship to lead exposure and lead-mediated oxidative stress in wild ungulates from an abandoned mining area.

    PubMed

    Rodríguez-Estival, Jaime; de la Lastra, José M Pérez; Ortiz-Santaliestra, Manuel E; Vidal, Dolors; Mateo, Rafael

    2013-04-01

    Lead (Pb) is a highly toxic metal that can induce oxidative stress and affect the immune system by modifying the expression of immunomodulator-related genes. The aim of the present study was to investigate the association between Pb exposure and the transcriptional profiles of some cytokines, as well as the relationship between Pb exposure and changes in oxidative stress biomarkers observed in the spleen of wild ungulates exposed to mining pollution. Red deer and wild boar from the mining area studied had higher spleen, liver, and bone Pb levels than controls, indicating a chronic exposure to Pb pollution. Such exposure caused a depletion of spleen glutathione levels in both species and disrupted the activity of antioxidant enzymes, suggesting the generation of oxidative stress conditions. Deer from the mining area also showed an induced T-helper (Th )-dependent immune response toward the Th 2 pathway, whereas boar from the mining area showed a cytokine profile suggesting an inclination of the immune response toward the Th 1 pathway. These results indicate that environmental exposure to Pb may alter immune responses in wild ungulates exposed to mining pollution. However, evidence of direct relationships between Pb-mediated oxidative stress and the changes detected in immune responses were not found. Further research is needed to evaluate the immunotoxic potential of Pb pollution, also considering the prevalence of chronic infectious diseases in wildlife in environments affected by mining activities. Copyright © 2013 SETAC.

  5. SSH gene expression profile of Eisenia andrei exposed in situ to a naturally contaminated soil from an abandoned uranium mine.

    PubMed

    Lourenço, Joana; Pereira, Ruth; Gonçalves, Fernando; Mendo, Sónia

    2013-02-01

    The effects of the exposure of earthworms (Eisenia andrei) to contaminated soil from an abandoned uranium mine, were assessed through gene expression profile evaluation by Suppression Subtractive Hybridization (SSH). Organisms were exposed in situ for 56 days, in containers placed both in a contaminated and in a non-contaminated site (reference). Organisms were sampled after 14 and 56 days of exposure. Results showed that the main physiological functions affected by the exposure to metals and radionuclides were: metabolism, oxireductase activity, redox homeostasis and response to chemical stimulus and stress. The relative expression of NADH dehydrogenase subunit 1 and elongation factor 1 alpha was also affected, since the genes encoding these enzymes were significantly up and down-regulated, after 14 and 56 days of exposure, respectively. Also, an EST with homology for SET oncogene was found to be up-regulated. To the best of our knowledge, this is the first time that this gene was identified in earthworms and thus, further studies are required, to clarify its involvement in the toxicity of metals and radionuclides. Considering the results herein presented, gene expression profiling proved to be a very useful tool to detect earthworms underlying responses to metals and radionuclides exposure, pointing out for the detection and development of potential new biomarkers. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature.

    PubMed

    Ye, Ning; Yin, Hengfu; Liu, Jingjing; Dai, Xiaogang; Yin, Tongming

    2015-01-01

    The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

  7. Mining microbial metatranscriptomes for expression of antibiotic resistance genes under natural conditions

    PubMed Central

    Versluis, Dennis; D’Andrea, Marco Maria; Ramiro Garcia, Javier; Leimena, Milkha M.; Hugenholtz, Floor; Zhang, Jing; Öztürk, Başak; Nylund, Lotta; Sipkema, Detmer; Schaik, Willem van; de Vos, Willem M.; Kleerebezem, Michiel; Smidt, Hauke; Passel, Mark W.J. van

    2015-01-01

    Antibiotic resistance genes are found in a broad range of ecological niches associated with complex microbiota. Here we investigated if resistance genes are not only present, but also transcribed under natural conditions. Furthermore, we examined the potential for antibiotic production by assessing the expression of associated secondary metabolite biosynthesis gene clusters. Metatranscriptome datasets from intestinal microbiota of four human adults, one human infant, 15 mice and six pigs, of which only the latter have received antibiotics prior to the study, as well as from sea bacterioplankton, a marine sponge, forest soil and sub-seafloor sediment, were investigated. We found that resistance genes are expressed in all studied ecological niches, albeit with niche-specific differences in relative expression levels and diversity of transcripts. For example, in mice and human infant microbiota predominantly tetracycline resistance genes were expressed while in human adult microbiota the spectrum of expressed genes was more diverse, and also included β-lactam, aminoglycoside and macrolide resistance genes. Resistance gene expression could result from the presence of natural antibiotics in the environment, although we could not link it to expression of corresponding secondary metabolites biosynthesis clusters. Alternatively, resistance gene expression could be constitutive, or these genes serve alternative roles besides antibiotic resistance. PMID:26153129

  8. From Saccharomyces cerevisiae to human: The important gene co-expression modules.

    PubMed

    Liu, Wei; Li, Li; Ye, Hua; Chen, Haiwei; Shen, Weibiao; Zhong, Yuexian; Tian, Tian; He, Huaqin

    2017-08-01

    Network-based systems biology has become an important method for analyzing high-throughput gene expression data and gene function mining. Yeast has long been a popular model organism for biomedical research. In the current study, a weighted gene co-expression network analysis algorithm was applied to construct a gene co-expression network in Saccharomyces cerevisiae . Seventeen stable gene co-expression modules were detected from 2,814 S. cerevisiae microarray data. Further characterization of these modules with the Database for Annotation, Visualization and Integrated Discovery tool indicated that these modules were associated with certain biological processes, such as heat response, cell cycle, translational regulation, mitochondrion oxidative phosphorylation, amino acid metabolism and autophagy. Hub genes were also screened by intra-modular connectivity. Finally, the module conservation was evaluated in a human disease microarray dataset. Functional modules were identified in budding yeast, some of which are associated with patient survival. The current study provided a paradigm for single cell microorganisms and potentially other organisms.

  9. Advances in genetic circuit design: novel biochemistries, deep part mining, and precision gene expression.

    PubMed

    Nielsen, Alec A K; Segall-Shapiro, Thomas H; Voigt, Christopher A

    2013-12-01

    Cells use regulatory networks to perform computational operations to respond to their environment. Reliably manipulating such networks would be valuable for many applications in biotechnology; for example, in having genes turn on only under a defined set of conditions or implementing dynamic or temporal control of expression. Still, building such synthetic regulatory circuits remains one of the most difficult challenges in genetic engineering and as a result they have not found widespread application. Here, we review recent advances that address the key challenges in the forward design of genetic circuits. First, we look at new design concepts, including the construction of layered digital and analog circuits, and new approaches to control circuit response functions. Second, we review recent work to apply part mining and computational design to expand the number of regulators that can be used together within one cell. Finally, we describe new approaches to obtain precise gene expression and to reduce context dependence that will accelerate circuit design by more reliably balancing regulators while reducing toxicity. Copyright © 2013. Published by Elsevier Ltd.

  10. Literature-based discovery of diabetes- and ROS-related targets

    PubMed Central

    2010-01-01

    Background Reactive oxygen species (ROS) are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS. Methods We present ROS- and diabetes-related targets (genes/proteins) collected from the biomedical literature through a text mining technology. A web-based literature mining tool, SciMiner, was applied to 1,154 biomedical papers indexed with diabetes and ROS by PubMed to identify relevant targets. Over-represented targets in the ROS-diabetes literature were obtained through comparisons against randomly selected literature. The expression levels of nine genes, selected from the top ranked ROS-diabetes set, were measured in the dorsal root ganglia (DRG) of diabetic and non-diabetic DBA/2J mice in order to evaluate the biological relevance of literature-derived targets in the pathogenesis of diabetic neuropathy. Results SciMiner identified 1,026 ROS- and diabetes-related targets from the 1,154 biomedical papers (http://jdrf.neurology.med.umich.edu/ROSDiabetes/). Fifty-three targets were significantly over-represented in the ROS-diabetes literature compared to randomly selected literature. These over-represented targets included well-known members of the oxidative stress response including catalase, the NADPH oxidase family, and the superoxide dismutase family of proteins. Eight of the nine selected genes exhibited significant differential expression between diabetic and non-diabetic mice. For six genes, the direction of expression change in diabetes paralleled enhanced oxidative stress in the DRG. Conclusions Literature mining compiled ROS-diabetes related targets from the biomedical literature and led us to evaluate the biological relevance of selected targets in the pathogenesis of diabetic neuropathy. PMID:20979611

  11. DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.

    PubMed

    Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan

    2018-04-01

    Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.

  12. An integrative data mining approach to identifying adverse outcome pathway signatures.

    PubMed

    Oki, Noffisat O; Edwards, Stephen W

    2016-03-28

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP network with the AHR gene, an interesting subnetwork including glaucoma was identified. While substantial literature exists to support the potential for AHR ligands to elicit glaucoma, it was not explicitly captured in the public annotation information in CTD. The subnetwork from this analysis suggests a cpAOP that includes changes in CYP1B1 expression, which has been previously established in the literature as a primary cause of glaucoma. These case studies highlight the value in integrating multiple data sources when defining cpAOPs for HTS data. Copyright © 2016. Published by Elsevier Ireland Ltd.

  13. Gene expression profiling to identify the toxicities and potentially relevant human disease outcomes associated with environmental heavy metal exposure.

    PubMed

    Korashy, Hesham M; Attafi, Ibraheem M; Famulski, Konrad S; Bakheet, Saleh A; Hafez, Mohammed M; Alsaad, Abdulaziz M S; Al-Ghadeer, Abdul Rahman M

    2017-02-01

    Heavy metals are the most commonly encountered toxic substances that increase susceptibility to various diseases after prolonged exposure. We have previously shown that healthy volunteers living near a mining area had significant contamination with heavy metals associated with significant changes in the expression of some detoxifying genes, xenobiotic metabolizing enzymes, and DNA repair genes. However, alterations of most of the molecular target genes associated with diseases are still unknown. Thus, the aims of this study were to (a) evaluate the gene expression profile and (b) identify the toxicities and potentially relevant human disease outcomes associated with long-term human exposure to environmental heavy metals in mining area using microarray analysis. For this purpose, 40 healthy male volunteers who were residents of a heavy metal-polluted area (Mahd Al-Dhahab city, Saudi Arabia) and 20 healthy male volunteers who were residents of a non-heavy metal-polluted area were included in the study. Total RNA was isolated from whole blood using PAXgene Blood RNA tubes and then reversed transcribed and hybridized to the gene array using the Affymetrix U219 GeneChip. Microarray analysis showed about 2129 genes were identified and differentially altered, among which a shared set of 425 genes was differentially expressed in the heavy metal-exposed groups. Ingenuity pathway analysis revealed that the most altered gene-regulated diseases in heavy metal-exposed groups included hematological and developmental disorders and mostly renal and urological diseases. Quantitative real-time polymerase chain reaction closely matched the microarray data for some genes tested. Importantly, changes in gene-related diseases were attributed to alterations in the genes encoded for protein synthesis. Renal and urological diseases were the diseases that were most frequently associated with the heavy metal-exposed group. Therefore, there is a need for further studies to validate these genes, which could be used as early biomarkers to prevent renal injury. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Biomarkers of metals exposure in fish from lead-zinc mining areas of Southeastern Missouri, USA

    USGS Publications Warehouse

    Schmitt, C.J.; Whyte, J.J.; Roberts, A.P.; Annis, M.L.; May, T.W.; Tillitt, D.E.

    2007-01-01

    The potential effects of proposed lead-zinc mining in an ecologically sensitive area were assessed by studying a nearby mining district that has been exploited for about 30 y under contemporary environmental regulations and with modern technology. Blood and liver samples representing fish of three species (largescale stoneroller, Campostoma oligolepis, n=91; longear sunfish, Lepomis megalotis, n=105; and northern hog sucker, Hypentelium nigricans, n=20) from 16 sites representing a range of conditions relative to mining activities were collected. Samples were analyzed for metals (also reported in a companion paper) and for biomarkers of metals exposure [erythrocyte ??-aminolevulinic acid dehydratase (ALA-D) activity; concentrations of zinc protoporphyrin (ZPP), iron, and hemoglobin (Hb) in blood; and hepatic metallothionein (MT) gene expression and lipid peroxidation]. Blood lead concentrations were significantly higher and ALA-D activity significantly lower in all species at sites nearest to active lead-zinc mines and in a stream contaminated by historical mining than at reference or downstream sites. ALA-D activity was also negatively correlated with blood lead concentrations in all three species but not with other metals. Iron and Hb concentrations were positively correlated in all three species, but were not correlated with any other metals in blood or liver in any species. MT gene expression was positively correlated with liver zinc concentrations, but neither MT nor lipid peroxidase differences among fish grouped according to lead concentrations were statistically significant. ZPP was not detected by hematofluorometry in most fish, but fish with detectable ZPP were from sites affected by mining. Collectively, these results confirm that metals are released to streams from active lead-zinc mining sites and are accumulated by fish. ?? 2007 Elsevier Inc. All rights reserved.

  15. Developmental gene regulation during tomato fruit ripening and in-vitro sepal morphogenesis

    PubMed Central

    Bartley, Glenn E; Ishida, Betty K

    2003-01-01

    Background Red ripe tomatoes are the result of numerous physiological changes controlled by hormonal and developmental signals, causing maturation or differentiation of various fruit tissues simultaneously. These physiological changes affect visual, textural, flavor, and aroma characteristics, making the fruit more appealing to potential consumers for seed dispersal. Developmental regulation of tomato fruit ripening has, until recently, been lacking in rigorous investigation. We previously indicated the presence of up-regulated transcription factors in ripening tomato fruit by data mining in TIGR Tomato Gene Index. In our in-vitro system, green tomato sepals cultured at 16 to 22°C turn red and swell like ripening tomato fruit while those at 28°C remain green. Results Here, we have further examined regulation of putative developmental genes possibly involved in tomato fruit ripening and development. Using molecular biological methods, we have determined the relative abundance of various transcripts of genes during in vitro sepal ripening and in tomato fruit pericarp at three stages of development. A number of transcripts show similar expression in fruits to RIN and PSY1, ripening-associated genes, and others show quite different expression. Conclusions Our investigation has resulted in confirmation of some of our previous database mining results and has revealed differences in gene expression that may be important for tomato cultivar variation. We present new and intriguing information on genes that should now be studied in a more focused fashion. PMID:12906715

  16. Characterization of two in vivo-expressed methyltransferases of the Mycobacterium tuberculosis complex: antigenicity and genetic regulation

    PubMed Central

    Golby, Paul; Nunez, Javier; Cockle, Paul J.; Ewer, Katie; Logan, Karen; Hogarth, Philip; Vordermeier, H. Martin; Hinds, Jason; Hewinson, R. Glyn; Gordon, Stephen V.

    2011-01-01

    Genome sequencing of Mycobacterium tuberculosis complex members has accelerated the search for new disease-control tools. Antigen mining is one area that has benefited enormously from access to genome data. As part of an ongoing antigen mining programme, we screened genes that were previously identified by transcriptome analysis as upregulated in response to an in vitro acid shock for their in vivo expression profile and antigenicity. We show that the genes encoding two methyltransferases, Mb1438c/Rv1403c and Mb1440c/Rv1404c, were highly upregulated in a mouse model of infection, and were antigenic in M. bovis-infected cattle. As the genes encoding these antigens were highly upregulated in vivo, we sought to define their genetic regulation. A mutant was constructed that was deleted for their putative regulator, Mb1439/Rv1404; loss of the regulator led to increased expression of the flanking methyltransferases and a defined set of distal genes. This work has therefore generated both applied and fundamental outputs, with the description of novel mycobacterial antigens that can now be moved into field trials, but also with the description of a regulatory network that is responsive to both in vivo and in vitro stimuli. PMID:18375799

  17. Bone-related gene profiles in developing calvaria.

    PubMed

    Cho, Je-Yoel; Lee, Won-Bong; Kim, Hyun-Jung; Mi Woo, Kyung; Baek, Jeong-Hwa; Choi, Je-Yong; Hur, Cheol-Gu; Ryoo, Hyun-Mo

    2006-05-10

    Generating a comprehensive understanding of osteogenesis-related gene profiles is very important in the development of new treatments for osteopenic conditions. Developing calvaria undergoes a typical intramembranous bone-forming process. To identify genes associated with osteoblast differentiation, we isolated total RNAs from parietal bones, that represent active osteoblasts, and sutural mesenchyme, that represents osteoprogenitor cells, and comprehensively analyzed their gene expression profiles using an oligo-based Affymetrix microarray chip containing 22,690 probes. About 2100 genes with "Present" calls had more than 2-fold higher expression in bone compared to sutures while 73 of these genes had more than 8-fold expression. Some of these genes are already known to be bone-related biomarkers: VitD receptor, bone sialoprotein, osteocalcin, osteopontin, MMP13, etc. Eight genes were selected and subjected to confirmation by quantitative real-time RT-PCR analyses. All the genes tested showed higher expression in bones, ranging from 5- to 140-fold. Several of these genes are ESTs while others are already known but their functions in osteogenesis were not previously known. Most genes of the BMP and FGF families probed in the Genechip analysis were more highly expressed in bone tissues compared to suture. All differentially-expressed Runx and Dlx family genes also showed higher expression in bone. These results imply that our data is valid and can be used as a good standard for the mining of osteogenesis-related genes.

  18. Adult mouse brain gene expression patterns bear an embryologic imprint

    PubMed Central

    Zapala, Matthew A.; Hovatta, Iiris; Ellison, Julie A.; Wodicka, Lisa; Del Rio, Jo A.; Tennant, Richard; Tynan, Wendy; Broide, Ron S.; Helton, Rob; Stoveken, Barbara S.; Winrow, Christopher; Lockhart, Daniel J.; Reilly, John F.; Young, Warren G.; Bloom, Floyd E.; Lockhart, David J.; Barlow, Carrolee

    2005-01-01

    The current model to explain the organization of the mammalian nervous system is based on studies of anatomy, embryology, and evolution. To further investigate the molecular organization of the adult mammalian brain, we have built a gene expression-based brain map. We measured gene expression patterns for 24 neural tissues covering the mouse central nervous system and found, surprisingly, that the adult brain bears a transcriptional “imprint” consistent with both embryological origins and classic evolutionary relationships. Embryonic cellular position along the anterior–posterior axis of the neural tube was shown to be closely associated with, and possibly a determinant of, the gene expression patterns in adult structures. We also observed a significant number of embryonic patterning and homeobox genes with region-specific expression in the adult nervous system. The relationships between global expression patterns for different anatomical regions and the nature of the observed region-specific genes suggest that the adult brain retains a degree of overall gene expression established during embryogenesis that is important for regional specificity and the functional relationships between regions in the adult. The complete collection of extensively annotated gene expression data along with data mining and visualization tools have been made available on a publicly accessible web site (www.barlow-lockhart-brainmapnimhgrant.org). PMID:16002470

  19. A-MADMAN: Annotation-based microarray data meta-analysis tool

    PubMed Central

    Bisognin, Andrea; Coppe, Alessandro; Ferrari, Francesco; Risso, Davide; Romualdi, Chiara; Bicciato, Silvio; Bortoluzzi, Stefania

    2009-01-01

    Background Publicly available datasets of microarray gene expression signals represent an unprecedented opportunity for extracting genomic relevant information and validating biological hypotheses. However, the exploitation of this exceptionally rich mine of information is still hampered by the lack of appropriate computational tools, able to overcome the critical issues raised by meta-analysis. Results This work presents A-MADMAN, an open source web application which allows the retrieval, annotation, organization and meta-analysis of gene expression datasets obtained from Gene Expression Omnibus. A-MADMAN addresses and resolves several open issues in the meta-analysis of gene expression data. Conclusion A-MADMAN allows i) the batch retrieval from Gene Expression Omnibus and the local organization of raw data files and of any related meta-information, ii) the re-annotation of samples to fix incomplete, or otherwise inadequate, metadata and to create user-defined batches of data, iii) the integrative analysis of data obtained from different Affymetrix platforms through custom chip definition files and meta-normalization. Software and documentation are available on-line at . PMID:19563634

  20. Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex.

    PubMed

    Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne

    2004-06-01

    One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."

  1. Gene Mining for Proline Based Signaling Proteins in Cell Wall of Arabidopsis thaliana

    PubMed Central

    Ihsan, Muhammad Z.; Ahmad, Samina J. N.; Shah, Zahid Hussain; Rehman, Hafiz M.; Aslam, Zubair; Ahuja, Ishita; Bones, Atle M.; Ahmad, Jam N.

    2017-01-01

    The cell wall (CW) as a first line of defense against biotic and abiotic stresses is of primary importance in plant biology. The proteins associated with cell walls play a significant role in determining a plant's sustainability to adverse environmental conditions. In this work, the genes encoding cell wall proteins (CWPs) in Arabidopsis were identified and functionally classified using geneMANIA and GENEVESTIGATOR with published microarrays data. This yielded 1605 genes, out of which 58 genes encoded proline-rich proteins (PRPs) and glycine-rich proteins (GRPs). Here, we have focused on the cellular compartmentalization, biological processes, and molecular functioning of proline-rich CWPs along with their expression at different plant developmental stages. The mined genes were categorized into five classes on the basis of the type of PRPs encoded in the cell wall of Arabidopsis thaliana. We review the domain structure and function of each class of protein, many with respect to the developmental stages of the plant. We have then used networks, hierarchical clustering and correlations to analyze co-expression, co-localization, genetic, and physical interactions and shared protein domains of these PRPs. This has given us further insight into these functionally important CWPs and identified a number of potentially new cell-wall related proteins in A. thaliana. PMID:28289422

  2. Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.

    PubMed

    Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida

    2014-09-15

    Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

  3. A data mining paradigm for identifying key factors in biological processes using gene expression data.

    PubMed

    Li, Jin; Zheng, Le; Uchiyama, Akihiko; Bin, Lianghua; Mauro, Theodora M; Elias, Peter M; Pawelczyk, Tadeusz; Sakowicz-Burkiewicz, Monika; Trzeciak, Magdalena; Leung, Donald Y M; Morasso, Maria I; Yu, Peng

    2018-06-13

    A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.

  4. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    PubMed

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.

  5. PAGE-1, an X chromosome-linked GAGE-like gene that is expressed in normal and neoplastic prostate, testis, and uterus

    PubMed Central

    Brinkmann, Ulrich; Vasmatzis, George; Lee, Byungkook; Yerushalmi, Noga; Essand, Magnus; Pastan, Ira

    1998-01-01

    We have used a combination of computerized database mining and experimental expression analyses to identify a gene that is preferentially expressed in normal male and female reproductive tissues, prostate, testis, fallopian tube, uterus, and placenta, as well as in prostate cancer, testicular cancer, and uterine cancer. This gene is located on the human X chromosome, and it is homologous to a family of genes encoding GAGE-like proteins. GAGE proteins are expressed in a variety of tumors and in testis. We designate the novel gene PAGE-1 because the expression pattern in the Cancer Genome Anatomy Project libraries indicates that it is predominantly expressed in normal and neoplastic prostate. Further database analysis indicates the presence of other genes with high homology to PAGE-1, which were found in cDNA libraries derived from testis, pooled libraries (with testis), and in a germ cell tumor library. The expression of PAGE-1 in normal and malignant prostate, testicular, and uterine tissues makes it a possible target for the diagnosis and possibly for the vaccine-based therapy of neoplasms of prostate, testis, and uterus. PMID:9724777

  6. PAGE-1, an X chromosome-linked GAGE-like gene that is expressed in normal and neoplastic prostate, testis, and uterus.

    PubMed

    Brinkmann, U; Vasmatzis, G; Lee, B; Yerushalmi, N; Essand, M; Pastan, I

    1998-09-01

    We have used a combination of computerized database mining and experimental expression analyses to identify a gene that is preferentially expressed in normal male and female reproductive tissues, prostate, testis, fallopian tube, uterus, and placenta, as well as in prostate cancer, testicular cancer, and uterine cancer. This gene is located on the human X chromosome, and it is homologous to a family of genes encoding GAGE-like proteins. GAGE proteins are expressed in a variety of tumors and in testis. We designate the novel gene PAGE-1 because the expression pattern in the Cancer Genome Anatomy Project libraries indicates that it is predominantly expressed in normal and neoplastic prostate. Further database analysis indicates the presence of other genes with high homology to PAGE-1, which were found in cDNA libraries derived from testis, pooled libraries (with testis), and in a germ cell tumor library. The expression of PAGE-1 in normal and malignant prostate, testicular, and uterine tissues makes it a possible target for the diagnosis and possibly for the vaccine-based therapy of neoplasms of prostate, testis, and uterus.

  7. Mapping Adipose and Muscle Tissue Expression Quantitative Trait Loci in African Americans to Identify Genes for Type 2 Diabetes and Obesity

    PubMed Central

    Sajuthi, Satria P.; Sharma, Neeraj K.; Chou, Jeff W.; Palmer, Nicholette D.; McWilliams, David R.; Beal, John; Comeau, Mary E.; Ma, Lijun; Calles-Escandon, Jorge; Demons, Jamehl; Rogers, Samantha; Cherry, Kristina; Menon, Lata; Kouba, Ethel; Davis, Donna; Burris, Marcie; Byerly, Sara J.; Ng, Maggie C.Y.; Maruthur, Nisa M.; Patel, Sanjay R.; Bielak, Lawrence F.; Lange, Leslie; Guo, Xiuqing; Sale, Michèle M.; Chan, Kei Hang; Monda, Keri L.; Chen, Gary K.; Taylor, Kira; Palmer, Cameron; Edwards, Todd L; North, Kari E.; Haiman, Christopher A.; Bowden, Donald W.; Freedman, Barry I.; Langefeld, Carl D.; Das, Swapan K.

    2016-01-01

    Relative to European Americans, type 2 diabetes (T2D) is more prevalent in African Americans (AAs). Genetic variation may modulate transcript abundance in insulin-responsive tissues and contribute to risk; yet published studies identifying expression quantitative trait loci (eQTLs) in African ancestry populations are restricted to blood cells. This study aims to develop a map of genetically regulated transcripts expressed in tissues important for glucose homeostasis in AAs, critical for identifying the genetic etiology of T2D and related traits. Quantitative measures of adipose and muscle gene expression, and genotypic data were integrated in 260 non-diabetic AAs to identify expression regulatory variants. Their roles in genetic susceptibility to T2D, and related metabolic phenotypes were evaluated by mining GWAS datasets. eQTL analysis identified 1,971 and 2,078 cis-eGenes in adipose and muscle, respectively. Cis-eQTLs for 885 transcripts including top cis-eGenes CHURC1, USMG5, and ERAP2, were identified in both tissues. 62.1% of top cis-eSNPs were within ±50kb of transcription start sites and cis-eGenes were enriched for mitochondrial transcripts. Mining GWAS databases revealed association of cis-eSNPs for more than 50 genes with T2D (e.g. PIK3C2A, RBMS1, UFSP1), gluco-metabolic phenotypes, (e.g. INPP5E, SNX17, ERAP2, FN3KRP), and obesity (e.g. POMC, CPEB4). Integration of GWAS meta-analysis data from AA cohorts revealed the most significant association for cis-eSNPs of ATP5SL and MCCC1 genes, with T2D and BMI, respectively. This study developed the first comprehensive map of adipose and muscle tissue eQTLs in AAs (publically accessible at https://mdsetaa.phs.wakehealth.edu) and identified genetically-regulated transcripts for delineating genetic causes of T2D, and related metabolic phenotypes. PMID:27193597

  8. Complementary techniques: validation of gene expression data by quantitative real time PCR.

    PubMed

    Provenzano, Maurizio; Mocellin, Simone

    2007-01-01

    Microarray technology can be considered the most powerful tool for screening gene expression profiles of biological samples. After data mining, results need to be validated with highly reliable biotechniques allowing for precise quantitation of transcriptional abundance of identified genes. Quantitative real time PCR (qrt-PCR) technology has recently reached a level of sensitivity, accuracy and practical ease that support its use as a routine bioinstrumentation for gene level measurement. Currently, qrt-PCR is considered by most experts the most appropriate method to confirm or confute microarray-generated data. The knowledge of the biochemical principles underlying qrt-PCR as well as some related technical issues must be beard in mind when using this biotechnology.

  9. SiBIC: a web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining.

    PubMed

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.

  10. DGEM--a microarray gene expression database for primary human disease tissues.

    PubMed

    Xia, Yuni; Campen, Andrew; Rigsby, Dan; Guo, Ying; Feng, Xingdong; Su, Eric W; Palakal, Mathew; Li, Shuyu

    2007-01-01

    Gene expression patterns can reflect gene regulations in human tissues under normal or pathologic conditions. Gene expression profiling data from studies of primary human disease samples are particularly valuable since these studies often span many years in order to collect patient clinical information and achieve a large sample size. Disease-to-Gene Expression Mapper (DGEM) provides a beneficial community resource to access and analyze these data; it currently includes Affymetrix oligonucleotide array datasets for more than 40 human diseases and 1400 samples. The data are normalized to the same scale and stored in a relational database. A statistical-analysis pipeline was implemented to identify genes abnormally expressed in disease tissues or genes whose expressions are associated with clinical parameters such as cancer patient survival. Data-mining results can be queried through a web-based interface at http://dgem.dhcp.iupui.edu/. The query tool enables dynamic generation of graphs and tables that are further linked to major gene and pathway resources that connect the data to relevant biology, including Entrez Gene and Kyoto Encyclopedia of Genes and Genomes (KEGG). In summary, DGEM provides scientists and physicians a valuable tool to study disease mechanisms, to discover potential disease biomarkers for diagnosis and prognosis, and to identify novel gene targets for drug discovery. The source code is freely available for non-profit use, on request to the authors.

  11. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

    PubMed Central

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-01-01

    Background DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865

  12. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.

    PubMed

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-09-03

    DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.

  13. Mining of Ruminant Microbial Phytase (RPHY1) from Metagenomic Data of Mehsani Buffalo Breed: Identification, Gene Cloning, and Characterization.

    PubMed

    Mootapally, Chandra Shekar; Nathani, Neelam M; Patel, Amrutlal K; Jakhesara, Subhash J; Joshi, Chaitanya G

    2016-01-01

    Phytases have been widely used as animal feed supplements to increase the availability of digestible phosphorus, especially in monogastric animals fed cereal grains. The present study describes the identification of a full-length phytase gene of Prevotella species present in Mehsani buffalo rumen. The gene, designated as RPHY1, consists of 1,251 bp and is expressed into protein with 417 amino acids. A homology search of the deduced amino acid sequence of the RPHY1 phytase gene in a nonredundant protein database showed that it shares 92% similarity with the histidine acid phosphatase domain. Subsequently, the RPHY1 gene was expressed using a pET32a expression vector in Escherichia coli BL21 and purified using a His60 Ni-NTA gravity column. The mass of the purified RPHY1 was estimated to be approximately 63 kDa by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). The optimal RPHY1 enzyme activity was observed at 55°C (pH 5) and exhibited good stability at 5°C and within the acidic pH range. Significant inhibition of RPHY1 activity was observed for Mg2+ and K+ metal ions, while Ca2+, Mn2+, and Na+ slightly inhibited enzyme activity. The RPHY1 phytase was susceptible to SDS, and it was highly stimulated in the presence of EDTA. Overall, the observed comparatively high enzyme activity levels and characteristics of the RPHY1 gene mined from rumen prove its promising candidature as a feed supplement enzyme in animal farming. © 2016 S. Karger AG, Basel.

  14. Superior Cross-Species Reference Genes: A Blueberry Case Study

    PubMed Central

    Die, Jose V.; Rowland, Lisa J.

    2013-01-01

    The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469

  15. Differential Connectivity in Colorectal Cancer Gene Expression Network

    PubMed

    Izadi, Fereshteh

    2018-05-30

    Colorectal cancer (CRC) is one of the challenging types of cancers; thus, exploring effective biomarkers related to colorectal could lead to significant progresses toward the treatment of this disease. In the present study, CRC gene expression datasets have been reanalyzed. Mutual differentially expressed genes across 294 normal mucosa and adjacent tumoral samples were then utilized in order to build two independent transcriptional regulatory networks. By analyzing the networks topologically, genes with differential global connectivity related to cancer state were determined for which the potential transcriptional regulators including transcription factors were identified. The majority of differentially connected genes (DCGs) were up-regulated in colorectal transcriptome experiments. Moreover, a number of these genes have been experimentally validated as cancer or CRC-associated genes. The DCGs, including GART, TGFB1, ITGA2, SLC16A5, SOX9, and MMP7, were investigated across 12 cancer types. Functional enrichment analysis followed by detailed data mining exhibited that these candidate genes could be related to CRC by mediating in metastatic cascade in addition to shared pathways with 12 cancer types by triggering the inflammatory events Our study uncovered correlated alterations in gene expression related to CRC susceptibility and progression that the potent candidate biomarkers could provide a link to disease.

  16. Heterologous Production and Yield Improvement of Epothilones in Burkholderiales Strain DSM 7029.

    PubMed

    Bian, Xiaoying; Tang, Biao; Yu, Yucong; Tu, Qiang; Gross, Frank; Wang, Hailong; Li, Aiying; Fu, Jun; Shen, Yuemao; Li, Yue-Zhong; Stewart, A Francis; Zhao, Guoping; Ding, Xiaoming; Müller, Rolf; Zhang, Youming

    2017-07-21

    The cloning of microbial natural product biosynthetic gene clusters and their heterologous expression in a suitable host have proven to be a feasible approach to improve the yield of valuable natural products and to begin mining cryptic natural products in microorganisms. Myxobacteria are a prolific source of novel bioactive natural products with only limited choices of heterologous hosts that have been exploited. Here, we describe the use of Burkholderiales strain DSM 7029 as a potential heterologous host for the functional expression of myxobacterial secondary metabolites. Using a newly established electroporation procedure, the 56 kb epothilone biosynthetic gene cluster from the myxobacterium Sorangium cellulosum was introduced into the chromosome of strain DSM 7029 by transposition. Production of epothilones A, B, C, and D was detected despite their yields being low. Optimization of the medium, introduction of the exogenous methylmalonyl-CoA biosynthetic pathway, and overexpression of rare tRNA genes resulted in an approximately 75-fold increase in the total yields of epothilones to 307 μg L -1 . These results show that strain DSM 7029 has the potential to produce epothilones with reasonable titers and might be a broadly applicable host for the heterologous expression of other myxobacterial polyketide synthases and nonribosomal peptide synthetases, expediting the process of genome mining.

  17. An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

    PubMed

    Wittenberger, T; Schaller, H C; Hellebrand, S

    2001-03-30

    We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.

  18. Transcriptome-Wide Identification of Reference Genes for Expression Analysis of Soybean Responses to Drought Stress along the Day.

    PubMed

    Marcolino-Gomes, Juliana; Rodrigues, Fabiana Aparecida; Fuganti-Pagliarini, Renata; Nakayama, Thiago Jonas; Ribeiro Reis, Rafaela; Bouças Farias, Jose Renato; Harmon, Frank G; Correa Molinari, Hugo Bruno; Correa Molinari, Mayla Daiane; Nepomuceno, Alexandre

    2015-01-01

    The soybean transcriptome displays strong variation along the day in optimal growth conditions and also in response to adverse circumstances, like drought stress. However, no study conducted to date has presented suitable reference genes, with stable expression along the day, for relative gene expression quantification in combined studies on drought stress and diurnal oscillations. Recently, water deficit responses have been associated with circadian clock oscillations at the transcription level, revealing the existence of hitherto unknown processes and increasing the demand for studies on plant responses to drought stress and its oscillation during the day. We performed data mining from a transcriptome-wide background using microarrays and RNA-seq databases to select an unpublished set of candidate reference genes, specifically chosen for the normalization of gene expression in studies on soybean under both drought stress and diurnal oscillations. Experimental validation and stability analysis in soybean plants submitted to drought stress and sampled during a 24 h timecourse showed that four of these newer reference genes (FYVE, NUDIX, Golgin-84 and CYST) indeed exhibited greater expression stability than the conventionally used housekeeping genes (ELF1-β and β-actin) under these conditions. We also demonstrated the effect of using reference candidate genes with different stability values to normalize the relative expression data from a drought-inducible soybean gene (DREB5) evaluated in different periods of the day.

  19. De novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes.

    PubMed

    Croucher, Peter J P; Brewer, Michael S; Winchell, Christopher J; Oxford, Geoff S; Gillespie, Rosemary G

    2013-12-08

    A number of spider species within the family Theridiidae exhibit a dramatic abdominal (opisthosomal) color polymorphism. The polymorphism is inherited in a broadly Mendelian fashion and in some species consists of dozens of discrete morphs that are convergent across taxa and populations. Few genomic resources exist for spiders. Here, as a first necessary step towards identifying the genetic basis for this trait we present the near complete transcriptomes of two species: the Hawaiian happy-face spider Theridion grallator and Theridion californicum. We mined the gene complement for pigment-pathway genes and examined differential expression (DE) between morphs that are unpatterned (plain yellow) and patterned (yellow with superimposed patches of red, white or very dark brown). By deep sequencing both RNA-seq and normalized cDNA libraries from pooled specimens of each species we were able to assemble a comprehensive gene set for both species that we estimate to be 98-99% complete. It is likely that these species express more than 20,000 protein-coding genes, perhaps 4.5% (ca. 870) of which might be unique to spiders. Mining for pigment-associated Drosophila melanogaster genes indicated the presence of all ommochrome pathway genes and most pteridine pathway genes and DE analyses further indicate a possible role for the pteridine pathway in theridiid color patterning. Based upon our estimates, T. grallator and T. californicum express a large inventory of protein-coding genes. Our comprehensive assembly illustrates the continuing value of sequencing normalized cDNA libraries in addition to RNA-seq in order to generate a reference transcriptome for non-model species. The identification of pteridine-related genes and their possible involvement in color patterning is a novel finding in spiders and one that suggests a biochemical link between guanine deposits and the pigments exhibited by these species.

  20. Mining the transcriptomes of four commercially important shellfish species for single nucleotide polymorphisms within biomineralization genes.

    PubMed

    Vendrami, David L J; Shah, Abhijeet; Telesca, Luca; Hoffman, Joseph I

    2016-06-01

    Transcriptional profiling not only provides insights into patterns of gene expression, but also generates sequences that can be mined for molecular markers, which in turn can be used for population genetic studies. As part of a large-scale effort to better understand how commercially important European shellfish species may respond to ocean acidification, we therefore mined the transcriptomes of four species (the Pacific oyster Crassostrea gigas, the blue mussel Mytilus edulis, the great scallop Pecten maximus and the blunt gaper Mya truncata) for single nucleotide polymorphisms (SNPs). Illumina data for C. gigas, M. edulis and P. maximus and 454 data for M. truncata were interrogated using GATK and SWAP454 respectively to identify between 8267 and 47,159 high quality SNPs per species (total=121,053 SNPs residing within 34,716 different contigs). We then annotated the transcripts containing SNPs to reveal homology to diverse genes. Finally, as oceanic pH affects the ability of organisms to incorporate calcium carbonate, we honed in on genes implicated in the biomineralization process to identify a total of 1899 SNPs in 157 genes. These provide good candidates for biomarkers with which to study patterns of selection in natural or experimental populations. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Use of keyword hierarchies to interpret gene expression patterns.

    PubMed

    Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J

    2001-04-01

    High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.

  2. PlanMine--a mineable resource of planarian biology and biodiversity.

    PubMed

    Brandl, Holger; Moon, HongKee; Vila-Farré, Miquel; Liu, Shang-Yun; Henry, Ian; Rink, Jochen C

    2016-01-04

    Planarian flatworms are in the midst of a renaissance as a model system for regeneration and stem cells. Besides two well-studied model species, hundreds of species exist worldwide that present a fascinating diversity of regenerative abilities, tissue turnover rates, reproductive strategies and other life history traits. PlanMine (http://planmine.mpi-cbg.de/) aims to accomplish two primary missions: First, to provide an easily accessible platform for sharing, comparing and value-added mining of planarian sequence data. Second, to catalyze the comparative analysis of the phenotypic diversity amongst planarian species. Currently, PlanMine houses transcriptomes independently assembled by our lab and community contributors. Detailed assembly/annotation statistics, a custom-developed BLAST viewer and easy export options enable comparisons at the contig and assembly level. Consistent annotation of all transcriptomes by an automated pipeline, the integration of published gene expression information and inter-relational query tools provide opportunities for mining planarian gene sequences and functions. For inter-species comparisons, we include transcriptomes of, so far, six planarian species, along with images, expert-curated information on their biology and pre-calculated cross-species sequence homologies. PlanMine is based on the popular InterMine system in order to make the rich biology of planarians accessible to the general life sciences research community. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Leaf miner-induced morphological, physiological and molecular changes in mangrove plant Avicennia marina (Forsk.) Vierh.

    PubMed

    Chen, Juan; Shen, Zhi-Jun; Lu, Wei-Zhi; Liu, Xiang; Wu, Fei-Hua; Gao, Gui-Feng; Liu, Yi-Ling; Wu, Chun-Sheng; Yan, Chong-Ling; Fan, Hang-Qing; Zhang, Yi-Hui; Zheng, Hai-Lei; Tsai, Chung-Jui

    2017-01-31

    Avicennia marina (Forsk.) Vierh is a widespread mangrove species along the southeast coasts of China. Recently, the outbreak of herbivorous insect, Phyllocnistis citrella Stainton, a leaf miner, have impacted on the growth of A. marina. Little is reported about the responses of A. marina to leaf miner infection at the biochemical, physiological and molecular levels. Here, we reported the responses of A. marina to leaf miner infection from the aspects of leaf structure, photosynthesis, and antioxidant system and miner responsive genes expression. A. marina leaves attacked by the leaf miner exhibited significant decreases in chlorophyll, carbon and nitrogen contents, as well as a decreased photosynthetic rate. Scanning and transmission electron microscopic observations revealed that the leaf miner only invaded the upper epidermis and destroyed the epidermal cell, which lead to the exposure of salt glands. In addition, the chloroplasts of mined leaves (ML) were swollen and the thylakoids degraded. The maximal net photosynthetic rate, stomatal conductance (Gs), carboxylation efficiency (CE), dark respiration (Rd), light respiration (Rp) and quantum yields (AQE) significantly decreased in the ML, whereas the light saturation point (Lsp), light compensation point (Lcp), water loss and CO2 compensation point (Г) increased in the ML. Moreover, chlorophyll fluorescence features also had been changed by leaf miner attacks. Interestingly, higher generation rate of O2ˉ· and lower antioxidant enzyme expression in the mined portion (MP) were found; on the contrary, higher H2O2 level and higher antioxidant enzyme expression in the non-mined portion (NMP) were revealed, implying that the NMP may be able to sense that the leaf miner attacks had happened in the MP of the A. marina leaf via H2O2 signaling. Besides, the protein expression of glutathione S-transferase (GST) and the glutathione (GSH) content were increased in the ML. In addition, insect resistance-related gene expression such as chitinase 3, RAR1, topless and PIF3 had significantly increased in the ML. Taken together, our data suggest that leaf miners could significantly affect leaf structure, photosynthesis, the antioxidant system and miner responsive gene expression in A. marina leaves.

  4. IAOseq: inferring abundance of overlapping genes using RNA-seq data.

    PubMed

    Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue

    2015-01-01

    Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.

  5. Tumor SHB gene expression affects disease characteristics in human acute myeloid leukemia.

    PubMed

    Jamalpour, Maria; Li, Xiujuan; Cavelier, Lucia; Gustafsson, Karin; Mostoslavsky, Gustavo; Höglund, Martin; Welsh, Michael

    2017-10-01

    The mouse Shb gene coding for the Src Homology 2-domain containing adapter protein B has recently been placed in context of BCRABL1-induced myeloid leukemia in mice and the current study was performed in order to relate SHB to human acute myeloid leukemia (AML). Publicly available AML databases were mined for SHB gene expression and patient survival. SHB gene expression was determined in the Uppsala cohort of AML patients by qPCR. Cell proliferation was determined after SHB gene knockdown in leukemic cell lines. Despite a low frequency of SHB gene mutations, many tumors overexpressed SHB mRNA compared with normal myeloid blood cells. AML patients with tumors expressing low SHB mRNA displayed longer survival times. A subgroup of AML exhibiting a favorable prognosis, acute promyelocytic leukemia (APL) with a PMLRARA translocation, expressed less SHB mRNA than AML tumors in general. When examining genes co-expressed with SHB in AML tumors, four other genes ( PAX5, HDAC7, BCORL1, TET1) related to leukemia were identified. A network consisting of these genes plus SHB was identified that relates to certain phenotypic characteristics, such as immune cell, vascular and apoptotic features. SHB knockdown in the APL PMLRARA cell line NB4 and the monocyte/macrophage cell line MM6 adversely affected proliferation, linking SHB gene expression to tumor cell expansion and consequently to patient survival. It is concluded that tumor SHB gene expression relates to AML survival and its subgroup APL. Moreover, this gene is included in a network of genes that plays a role for an AML phenotype exhibiting certain immune cell, vascular and apoptotic characteristics.

  6. Polycistronic gene expression in Aspergillus niger.

    PubMed

    Schuetze, Tabea; Meyer, Vera

    2017-09-25

    Genome mining approaches predict dozens of biosynthetic gene clusters in each of the filamentous fungal genomes sequenced so far. However, the majority of these gene clusters still remain cryptic because they are not expressed in their natural host. Simultaneous expression of all genes belonging to a biosynthetic pathway in a heterologous host is one approach to activate biosynthetic gene clusters and to screen the metabolites produced for bioactivities. Polycistronic expression of all pathway genes under control of a single and tunable promoter would be the method of choice, as this does not only simplify cloning procedures, but also offers control on timing and strength of expression. However, polycistronic gene expression is a feature not commonly found in eukaryotic host systems, such as Aspergillus niger. In this study, we tested the suitability of the viral P2A peptide for co-expression of three genes in A. niger. Two genes descend from Fusarium oxysporum and are essential to produce the secondary metabolite enniatin (esyn1, ekivR). The third gene (luc) encodes the reporter luciferase which was included to study position effects. Expression of the polycistronic gene cassette was put under control of the Tet-On system to ensure tunable gene expression in A. niger. In total, three polycistronic expression cassettes which differed in the position of luc were constructed and targeted to the pyrG locus in A. niger. This allowed direct comparison of the luciferase activity based on the position of the luciferase gene. Doxycycline-mediated induction of the Tet-On expression cassettes resulted in the production of one long polycistronic mRNA as proven by Northern analyses, and ensured comparable production of enniatin in all three strains. Notably, gene position within the polycistronic expression cassette matters, as, luciferase activity was lowest at position one and had a comparable activity at positions two and three. The P2A peptide can be used to express at least three genes polycistronically in A. niger. This approach can now be applied to heterologously express entire secondary metabolite gene clusters polycistronically or to co-express any genes of interest in equimolar amounts.

  7. Pattern Genes Suggest Functional Connectivity of Organs

    NASA Astrophysics Data System (ADS)

    Qin, Yangmei; Pan, Jianbo; Cai, Meichun; Yao, Lixia; Ji, Zhiliang

    2016-05-01

    Human organ, as the basic structural and functional unit in human body, is made of a large community of different cell types that organically bound together. Each organ usually exerts highly specified physiological function; while several related organs work smartly together to perform complicated body functions. In this study, we present a computational effort to understand the roles of genes in building functional connection between organs. More specifically, we mined multiple transcriptome datasets sampled from 36 human organs and tissues, and quantitatively identified 3,149 genes whose expressions showed consensus modularly patterns: specific to one organ/tissue, selectively expressed in several functionally related tissues and ubiquitously expressed. These pattern genes imply intrinsic connections between organs. According to the expression abundance of the 766 selective genes, we consistently cluster the 36 human organs/tissues into seven functional groups: adipose & gland, brain, muscle, immune, metabolism, mucoid and nerve conduction. The organs and tissues in each group either work together to form organ systems or coordinate to perform particular body functions. The particular roles of specific genes and selective genes suggest that they could not only be used to mechanistically explore organ functions, but also be designed for selective biomarkers and therapeutic targets.

  8. Comparative Analysis of Stress Induced Gene Expression in Caenorhabditis elegans following Exposure to Environmental and Lab Reconstituted Complex Metal Mixture

    PubMed Central

    Kumar, Ranjeet; Pradhan, Ajay; Khan, Faisal Ahmad; Lindström, Pia; Ragnvaldsson, Daniel; Ivarsson, Per; Olsson, Per-Erik; Jass, Jana

    2015-01-01

    Metals are essential for many physiological processes and are ubiquitously present in the environment. However, high metal concentrations can be harmful to organisms and lead to physiological stress and diseases. The accumulation of transition metals in the environment due to either natural processes or anthropogenic activities such as mining results in the contamination of water and soil environments. The present study used Caenorhabditis elegans to evaluate gene expression as an indicator of physiological response, following exposure to water collected from three different locations downstream of a Swedish mining site and a lab reconstituted metal mixture. Our results indicated that the reconstituted metal mixture exerted a direct stress response in C. elegans whereas the environmental waters elicited either a diminished or abrogated response. This suggests that it is not sufficient to use the biological effects observed from laboratory mixtures to extrapolate the effects observed in complex aquatic environments and apply this to risk assessment and intervention. PMID:26168046

  9. Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining.

    PubMed

    Tang, Xiaoyu; Li, Jie; Millán-Aguiñaga, Natalie; Zhang, Jia Jia; O'Neill, Ellis C; Ugalde, Juan A; Jensen, Paul R; Mantovani, Simone M; Moore, Bradley S

    2015-12-18

    Recent genome sequencing efforts have led to the rapid accumulation of uncharacterized or "orphaned" secondary metabolic biosynthesis gene clusters (BGCs) in public databases. This increase in DNA-sequenced big data has given rise to significant challenges in the applied field of natural product genome mining, including (i) how to prioritize the characterization of orphan BGCs and (ii) how to rapidly connect genes to biosynthesized small molecules. Here, we show that by correlating putative antibiotic resistance genes that encode target-modified proteins with orphan BGCs, we predict the biological function of pathway specific small molecules before they have been revealed in a process we call target-directed genome mining. By querying the pan-genome of 86 Salinispora bacterial genomes for duplicated house-keeping genes colocalized with natural product BGCs, we prioritized an orphan polyketide synthase-nonribosomal peptide synthetase hybrid BGC (tlm) with a putative fatty acid synthase resistance gene. We employed a new synthetic double-stranded DNA-mediated cloning strategy based on transformation-associated recombination to efficiently capture tlm and the related ttm BGCs directly from genomic DNA and to heterologously express them in Streptomyces hosts. We show the production of a group of unusual thiotetronic acid natural products, including the well-known fatty acid synthase inhibitor thiolactomycin that was first described over 30 years ago, yet never at the genetic level in regards to biosynthesis and autoresistance. This finding not only validates the target-directed genome mining strategy for the discovery of antibiotic producing gene clusters without a priori knowledge of the molecule synthesized but also paves the way for the investigation of novel enzymology involved in thiotetronic acid natural product biosynthesis.

  10. The impacts of neutralized acid mine drainage contaminated water on the expression of selected endocrine-linked genes in juvenile Mozambique tilapia Oreochromis mossambicus exposed in vivo.

    PubMed

    Truter, Johannes Christoff; va Wyk, Johannes Hendrik; Oberholster, Paul Johan; Botha, Anna-Maria

    2014-02-01

    Acid mine drainage (AMD) is a global environmental concern due to detrimental impacts on river ecosystems. Little is however known regarding the biological impacts of neutralized AMD on aquatic vertebrates despite excessive discharge into watercourses. The aim of this investigation was to evaluate the endocrine modulatory potential of neutralized AMD, using molecular biomarkers in the teleost fish Oreochromis mossambicus in exposure studies. Surface water was collected from six locations downstream of a high density sludge (HDS) AMD treatment plant and a reference site unimpacted by AMD. The concentrations of 28 elements, including 22 metals, were quantified in the exposure water in order to identify potential links to altered gene expression. Relatively high concentrations of manganese (~ 10mg/l), nickel (~ 0.1mg/l) and cobalt (~ 0.03 mg/l) were detected downstream of the HDS plant. The expression of thyroid receptor-α (trα), trβ, androgen receptor-1 (ar1), ar2, glucocorticoid receptor-1 (gr1), gr2, mineralocorticoid receptor (mr) and aromatase (cyp19a1b) was quantified in juvenile fish after 48 h exposure. Slight but significant changes were observed in the expression of gr1 and mr in fish exposed to water collected directly downstream of the HDS plant, consisting of approximately 95 percent neutralized AMD. The most pronounced alterations in gene expression (i.e. trα, trβ, gr1, gr2, ar1 and mr) was associated with water collected further downstream at a location with no other apparent contamination vectors apart from the neutralized AMD. The altered gene expression associated with the "downstream" locality coincided with higher concentrations of certain metals relative to the locality adjacent to the HDS plant which may indicate a causative link. The current study provides evidence of endocrine disruptive activity associated with neutralized AMD contamination in regard to alterations in the expression of key genes linked to the thyroid, interrenal and gonadal endocrine axes of a teleost fish species. © 2013 Published by Elsevier Inc.

  11. Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W

    PubMed Central

    Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng

    2014-01-01

    Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154

  12. Exploring patterns of epigenetic information with data mining techniques.

    PubMed

    Aguiar-Pulido, Vanessa; Seoane, José A; Gestal, Marcos; Dorado, Julián

    2013-01-01

    Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.

  13. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    PubMed

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  14. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis

    NASA Astrophysics Data System (ADS)

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan

    2015-03-01

    Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning ``plug-and-play'' approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.

  15. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis.

    PubMed

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S; Qian, Pei-Yuan

    2015-03-24

    Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning "plug-and-play" approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.

  16. Study of five novel non-synonymous polymorphisms in human brain-expressed genes in a Colombian sample.

    PubMed

    Ojeda, Diego A; Forero, Diego A

    2014-10-01

    Non-synonymous single nucleotide polymorphisms (nsSNPs) in brain-expressed genes represent interesting candidates for genetic research in neuropsychiatric disorders. To study novel nsSNPs in brain-expressed genes in a sample of Colombian subjects. We applied an approach based on in silico mining of available genomic data to identify and select novel nsSNPs in brain-expressed genes. We developed novel genotyping assays, based in allele-specific PCR methods, for these nsSNPs and genotyped them in 171 Colombian subjects. Five common nsSNPs (rs6855837; p.Leu395Ile, rs2305160; p.Thr394Ala, rs10503929; p.Met289Thr, rs2270641; p.Thr4Pro and rs3822659; p.Ser735Ala) were studied, located in the CLOCK, NPAS2, NRG1, SLC18A1 and WWC1 genes. We reported allele and genotype frequencies in a sample of South American healthy subjects. There is previous experimental evidence, arising from genome-wide expression and association studies, for the involvement of these genes in several neuropsychiatric disorders and endophenotypes, such as schizophrenia, mood disorders or memory performance. Frequencies for these nsSNPSs in the Colombian samples varied in comparison to different HapMap populations. Future study of these nsSNPs in brain-expressed genes, a synaptogenomics approach, will be important for a better understanding of neuropsychiatric diseases and endophenotypes in different populations.

  17. Isoflurane is a suitable alternative to ether for anesthetizing rats prior to euthanasia for gene expression analysis.

    PubMed

    Nakatsu, Noriyuki; Igarashi, Yoshinobu; Aoshi, Taiki; Hamaguchi, Isao; Saito, Masumichi; Mizukami, Takuo; Momose, Haruka; Ishii, Ken J; Yamada, Hiroshi

    2017-01-01

    Diethyl ether (ether) had been widely used in Japan for anesthesia, despite its explosive properties and toxicity to both humans and animals. We also had used ether as an anesthetic for euthanizing rats for research in the Toxicogenomics Project (TGP). Because the use of ether for these purposes will likely cease, it is required to select an alternative anesthetic which is validated for consistency with existing TGP data acquired under ether anesthesia. We therefore compared two alternative anesthetic candidates, isoflurane and pentobarbital, with ether in terms of hematological findings, serum biochemical parameters, and gene expressions. As a result, few differences among the three agents were observed. In hematological and serum biochemistry analysis, no significant changes were found. In gene expression analysis, four known genes were extracted as differentially expressed genes in the liver of rats anesthetized with ether, isoflurane, or pentobarbital. However, no significant relationships were detected using gene ontology, pathway, or gene enrichment analyses by DAVID and TargetMine. Surprisingly, although it was expected that the lung would be affected by administration via inhalation, only one differentially expressed gene was extracted in the lung. Taken together, our data indicate that there are no significant differences among ether, isoflurane, and pentobarbital with respect to effects on hematological parameters, serum biochemistry parameters, and gene expression. Based on its smallest affect to existing data and its safety profile for humans and animals, we suggest isoflurane as a suitable alternative anesthetic for use in rat euthanasia in toxicogenomics analysis.

  18. Divergence between motoneurons: gene expression profiling provides a molecular characterization of functionally discrete somatic and autonomic motoneurons

    PubMed Central

    Cui, Dapeng; Dougherty, Kimberly J.; Machacek, David W.; Sawchuk, Michael; Hochman, Shawn; Baro, Deborah J.

    2009-01-01

    Studies in the developing spinal cord suggest that different motoneuron (MN) cell types express very different genetic programs, but the degree to which adult programs differ is unknown. To compare genetic programs between adult MN columnar cell types, we used laser capture micro-dissection (LCM) and Affymetrix microarrays to create expression profiles for three columnar cell types: lateral and medial MNs from lumbar segments and sympathetic preganglionic motoneurons located in the thoracic intermediolateral nucleus. A comparison of the three expression profiles indicated that ~7% (813/11,552) of the genes showed significant differences in their expression levels. The largest differences were observed between sympathetic preganglionic MNs and the lateral motor column, with 6% (706/11,552) of the genes being differentially expressed. Significant differences in expression were observed for 1.8% (207/11,552) of the genes when comparing sympathetic preganglionic MNs with the medial motor column. Lateral and medial MNs showed the least divergence, with 1.3% (150/11,552) of the genes being differentially expressed. These data indicate that the amount of divergence in expression profiles between identified columnar MNs does not strictly correlate with divergence of function as defined by innervation patterns (somatic/muscle vs. autonomic/viscera). Classification of the differentially expressed genes with regard to function showed that they underpin all fundamental cell systems and processes, although most differentially expressed genes encode proteins involved in signal transduction. Mining the expression profiles to examine transcription factors essential for MN development suggested that many of the same transcription factors participatein combinatorial codes in embryonic and adult neurons, but patterns of expression change significantly. PMID:16317082

  19. Comparative metagenomic and metatranscriptomic analyses of microbial communities in acid mine drainage.

    PubMed

    Chen, Lin-xing; Hu, Min; Huang, Li-nan; Hua, Zheng-shuang; Kuang, Jia-liang; Li, Sheng-jin; Shu, Wen-sheng

    2015-07-01

    The microbial communities in acid mine drainage have been extensively studied to reveal their roles in acid generation and adaption to this environment. Lacking, however, are integrated community- and organism-wide comparative gene transcriptional analyses that could reveal the response and adaptation mechanisms of these extraordinary microorganisms to different environmental conditions. In this study, comparative metagenomics and metatranscriptomics were performed on microbial assemblages collected from four geochemically distinct acid mine drainage (AMD) sites. Taxonomic analysis uncovered unexpectedly high microbial biodiversity of these extremely acidophilic communities, and the abundant taxa of Acidithiobacillus, Leptospirillum and Acidiphilium exhibited high transcriptional activities. Community-wide comparative analyses clearly showed that the AMD microorganisms adapted to the different environmental conditions via regulating the expression of genes involved in multiple in situ functional activities, including low-pH adaptation, carbon, nitrogen and phosphate assimilation, energy generation, environmental stress resistance, and other functions. Organism-wide comparative analyses of the active taxa revealed environment-dependent gene transcriptional profiles, especially the distinct strategies used by Acidithiobacillus ferrivorans and Leptospirillum ferrodiazotrophum in nutrients assimilation and energy generation for survival under different conditions. Overall, these findings demonstrate that the gene transcriptional profiles of AMD microorganisms are closely related to the site physiochemical characteristics, providing clues into the microbial response and adaptation mechanisms in the oligotrophic, extremely acidic environments.

  20. Genic insights from integrated human proteomics in GeneCards.

    PubMed

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.

  1. Integrating text mining into the MGI biocuration workflow

    PubMed Central

    Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature. PMID:20157492

  2. Integrating text mining into the MGI biocuration workflow.

    PubMed

    Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.

  3. Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment.

    PubMed

    Severgnini, Marco; Bicciato, Silvio; Mangano, Eleonora; Scarlatti, Francesca; Mezzelani, Alessandra; Mattioli, Michela; Ghidoni, Riccardo; Peano, Clelia; Bonnal, Raoul; Viti, Federica; Milanesi, Luciano; De Bellis, Gianluca; Battaglia, Cristina

    2006-06-01

    Meta-analysis of microarray data is increasingly important, considering both the availability of multiple platforms using disparate technologies and the accumulation in public repositories of data sets from different laboratories. We addressed the issue of comparing gene expression profiles from two microarray platforms by devising a standardized investigative strategy. We tested this procedure by studying MDA-MB-231 cells, which undergo apoptosis on treatment with resveratrol. Gene expression profiles were obtained using high-density, short-oligonucleotide, single-color microarray platforms: GeneChip (Affymetrix) and CodeLink (Amersham). Interplatform analyses were carried out on 8414 common transcripts represented on both platforms, as identified by LocusLink ID, representing 70.8% and 88.6% of annotated GeneChip and CodeLink features, respectively. We identified 105 differentially expressed genes (DEGs) on CodeLink and 42 DEGs on GeneChip. Among them, only 9 DEGs were commonly identified by both platforms. Multiple analyses (BLAST alignment of probes with target sequences, gene ontology, literature mining, and quantitative real-time PCR) permitted us to investigate the factors contributing to the generation of platform-dependent results in single-color microarray experiments. An effective approach to cross-platform comparison involves microarrays of similar technologies, samples prepared by identical methods, and a standardized battery of bioinformatic and statistical analyses.

  4. GEM2Net: from gene expression modeling to -omics networks, a new CATdb module to investigate Arabidopsis thaliana genes involved in stress response.

    PubMed

    Zaag, Rim; Tamby, Jean Philippe; Guichard, Cécile; Tariq, Zakia; Rigaill, Guillem; Delannoy, Etienne; Renou, Jean-Pierre; Balzergue, Sandrine; Mary-Huard, Tristan; Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique

    2015-01-01

    CATdb (http://urgv.evry.inra.fr/CATdb) is a database providing a public access to a large collection of transcriptomic data, mainly for Arabidopsis but also for other plants. This resource has the rare advantage to contain several thousands of microarray experiments obtained with the same technical protocol and analyzed by the same statistical pipelines. In this paper, we present GEM2Net, a new module of CATdb that takes advantage of this homogeneous dataset to mine co-expression units and decipher Arabidopsis gene functions. GEM2Net explores 387 stress conditions organized into 18 biotic and abiotic stress categories. For each one, a model-based clustering is applied on expression differences to identify clusters of co-expressed genes. To characterize functions associated with these clusters, various resources are analyzed and integrated: Gene Ontology, subcellular localization of proteins, Hormone Families, Transcription Factor Families and a refined stress-related gene list associated to publications. Exploiting protein-protein interactions and transcription factors-targets interactions enables to display gene networks. GEM2Net presents the analysis of the 18 stress categories, in which 17,264 genes are involved and organized within 681 co-expression clusters. The meta-data analyses were stored and organized to compose a dynamic Web resource. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.

    PubMed

    Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A

    2013-01-29

    Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  6. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878

  7. Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data.

    PubMed

    Lee, Hyeonjeong; Shin, Miyoung

    2017-01-01

    The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.

  8. A comparative gene expression analysis of iron-limited cultures of Chaetoceros socialis and Pseudo-nitzschia arenysensis using newly developed iron assays

    NASA Astrophysics Data System (ADS)

    Abdala, Z. M.; Powell, K.; Cronin, D.; Chappell, D.

    2016-02-01

    A comparative gene expression analysis of iron-limited cultures of Chaetoceros socialis and Pseudo-nitzschia arenysensisusing newly developed iron assays Zuzanna M. Abdala, Kimberly Powell, Dylan P. Cronin, P. Dreux Chappell Diatoms, accounting for about 40% of the primary production in marine ecosystems, play a vital role in the dynamics of marine systems. Iron availability is understood to be a driving factor controlling productivity of many marine phytoplankton, including diatoms, as it functions as a cofactor for many proteins including several involved with photosynthetic processes. Previous work examining transcriptomes of diatoms of the Thalassiosira genus grown in controlled laboratory settings has identified genes whose expression can be used as sensitive markers of iron status. Data mining publically available diatom transcriptome data for these genes enables development of additional iron status assays for environmentally-relevant diatoms. For the present study, gene expression analysis of iron-limited laboratory cultures of Chaetoceros socialis and Pseudo-nitzschia arenysensis grown in continuous light was done using quantitative reverse transcriptase polymerase chain reaction (qRT-PCR). C. socialis and P. arenysensis serve as comparative models for analyzing gene expression in iron limitation in different ecological community assemblages. These data may ultimately assist to illuminate the function of iron in photosynthetic activity in diatoms.

  9. Case-based retrieval framework for gene expression data.

    PubMed

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J

    2015-01-01

    The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.

  10. Co-expression analysis identifies CRC and AP1 the regulator of Arabidopsis fatty acid biosynthesis.

    PubMed

    Han, Xinxin; Yin, Linlin; Xue, Hongwei

    2012-07-01

    Fatty acids (FAs) play crucial rules in signal transduction and plant development, however, the regulation of FA metabolism is still poorly understood. To study the relevant regulatory network, fifty-eight FA biosynthesis genes including de novo synthases, desaturases and elongases were selected as "guide genes" to construct the co-expression network. Calculation of the correlation between all Arabidopsis thaliana (L.) genes with each guide gene by Arabidopsis co-expression dating mining tools (ACT) identifies 797 candidate FA-correlated genes. Gene ontology (GO) analysis of these co-expressed genes showed they are tightly correlated to photosynthesis and carbohydrate metabolism, and function in many processes. Interestingly, 63 transcription factors (TFs) were identified as candidate FA biosynthesis regulators and 8 TF families are enriched. Two TF genes, CRC and AP1, both correlating with 8 FA guide genes, were further characterized. Analyses of the ap1 and crc mutant showed the altered total FA composition of mature seeds. The contents of palmitoleic acid, stearic acid, arachidic acid and eicosadienoic acid are decreased, whereas that of oleic acid is increased in ap1 and crc seeds, which is consistent with the qRT-PCR analysis revealing the suppressed expression of the corresponding guide genes. In addition, yeast one-hybrid analysis and electrophoretic mobility shift assay (EMSA) revealed that CRC can bind to the promoter regions of KCS7 and KCS15, indicating that CRC may directly regulate FA biosynthesis. © 2012 Institute of Botany, Chinese Academy of Sciences.

  11. Gene Expression Patterns Associated With Histopathology in Toxic Liver Fibrosis.

    PubMed

    Ippolito, Danielle L; AbdulHameed, Mohamed Diwan M; Tawa, Gregory J; Baer, Christine E; Permenter, Matthew G; McDyre, Bonna C; Dennis, William E; Boyle, Molly H; Hobbs, Cheryl A; Streicker, Michael A; Snowden, Bobbi S; Lewis, John A; Wallqvist, Anders; Stallings, Jonathan D

    2016-01-01

    Toxic industrial chemicals induce liver injury, which is difficult to diagnose without invasive procedures. Identifying indicators of end organ injury can complement exposure-based assays and improve predictive power. A multiplexed approach was used to experimentally evaluate a panel of 67 genes predicted to be associated with the fibrosis pathology by computationally mining DrugMatrix, a publicly available repository of gene microarray data. Five-day oral gavage studies in male Sprague Dawley rats dosed with varying concentrations of 3 fibrogenic compounds (allyl alcohol, carbon tetrachloride, and 4,4'-methylenedianiline) and 2 nonfibrogenic compounds (bromobenzene and dexamethasone) were conducted. Fibrosis was definitively diagnosed by histopathology. The 67-plex gene panel accurately diagnosed fibrosis in both microarray and multiplexed-gene expression assays. Necrosis and inflammatory infiltration were comorbid with fibrosis. ANOVA with contrasts identified that 51 of the 67 predicted genes were significantly associated with the fibrosis phenotype, with 24 of these specific to fibrosis alone. The protein product of the gene most strongly correlated with the fibrosis phenotype PCOLCE (Procollagen C-Endopeptidase Enhancer) was dose-dependently elevated in plasma from animals administered fibrogenic chemicals (P < .05). Semiquantitative global mass spectrometry analysis of the plasma identified an additional 5 protein products of the gene panel which increased after fibrogenic toxicant administration: fibronectin, ceruloplasmin, vitronectin, insulin-like growth factor binding protein, and α2-macroglobulin. These results support the data mining approach for identifying gene and/or protein panels for assessing liver injury and may suggest bridging biomarkers for molecular mediators linked to histopathology. Published by Oxford University Press on behalf of the Society of Toxicology 2015. This work is written by US Government employees and is in the public domain in the US.

  12. Mining microarrays for metabolic meaning: nutritional regulation of hypothalamic gene expression.

    PubMed

    Mobbs, Charles V; Yen, Kelvin; Mastaitis, Jason; Nguyen, Ha; Watson, Elizabeth; Wurmbach, Elisa; Sealfon, Stuart C; Brooks, Andrew; Salton, Stephen R J

    2004-06-01

    DNA microarray analysis has been used to investigate relative changes in the level of gene expression in the CNS, including changes that are associated with disease, injury, psychiatric disorders, drug exposure or withdrawal, and memory formation. We have used oligonucleotide microarrays to identify hypothalamic genes that respond to nutritional manipulation. In addition to commonly used microarray analysis based on criteria such as fold-regulation, we have also found that simply carrying out multiple t tests then sorting by P value constitutes a highly reliable method to detect true regulation, as assessed by real-time polymerase chain reaction (PCR), even for relatively low abundance genes or relatively low magnitude of regulation. Such analyses directly suggested novel mechanisms that mediate effects of nutritional state on neuroendocrine function and are being used to identify regulated gene products that may elucidate the metabolic pathology of obese ob/ob, lean Vgf-/Vgf-, and other models with profound metabolic impairments.

  13. BAGEL4: a user-friendly web server to thoroughly mine RiPPs and bacteriocins.

    PubMed

    van Heel, Auke J; de Jong, Anne; Song, Chunxu; Viel, Jakob H; Kok, Jan; Kuipers, Oscar P

    2018-05-21

    Interest in secondary metabolites such as RiPPs (ribosomally synthesized and posttranslationally modified peptides) is increasing worldwide. To facilitate the research in this field we have updated our mining web server. BAGEL4 is faster than its predecessor and is now fully independent from ORF-calling. Gene clusters of interest are discovered using the core-peptide database and/or through HMM motifs that are present in associated context genes. The databases used for mining have been updated and extended with literature references and links to UniProt and NCBI. Additionally, we have included automated promoter and terminator prediction and the option to upload RNA expression data, which can be displayed along with the identified clusters. Further improvements include the annotation of the context genes, which is now based on a fast blast against the prokaryote part of the UniRef90 database, and the improved web-BLAST feature that dynamically loads structural data such as internal cross-linking from UniProt. Overall BAGEL4 provides the user with more information through a user-friendly web-interface which simplifies data evaluation. BAGEL4 is freely accessible at http://bagel4.molgenrug.nl.

  14. A High-Throughput Data Mining of Single Nucleotide Polymorphisms in Coffea Species Expressed Sequence Tags Suggests Differential Homeologous Gene Expression in the Allotetraploid Coffea arabica1[W

    PubMed Central

    Vidal, Ramon Oliveira; Mondego, Jorge Maurício Costa; Pot, David; Ambrósio, Alinne Batista; Andrade, Alan Carvalho; Pereira, Luiz Filipe Protasio; Colombo, Carlos Augusto; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães

    2010-01-01

    Polyploidization constitutes a common mode of evolution in flowering plants. This event provides the raw material for the divergence of function in homeologous genes, leading to phenotypic novelty that can contribute to the success of polyploids in nature or their selection for use in agriculture. Mounting evidence underlined the existence of homeologous expression biases in polyploid genomes; however, strategies to analyze such transcriptome regulation remained scarce. Important factors regarding homeologous expression biases remain to be explored, such as whether this phenomenon influences specific genes, how paralogs are affected by genome doubling, and what is the importance of the variability of homeologous expression bias to genotype differences. This study reports the expressed sequence tag assembly of the allopolyploid Coffea arabica and one of its direct ancestors, Coffea canephora. The assembly was used for the discovery of single nucleotide polymorphisms through the identification of high-quality discrepancies in overlapped expressed sequence tags and for gene expression information indirectly estimated by the transcript redundancy. Sequence diversity profiles were evaluated within C. arabica (Ca) and C. canephora (Cc) and used to deduce the transcript contribution of the Coffea eugenioides (Ce) ancestor. The assignment of the C. arabica haplotypes to the C. canephora (CaCc) or C. eugenioides (CaCe) ancestral genomes allowed us to analyze gene expression contributions of each subgenome in C. arabica. In silico data were validated by the quantitative polymerase chain reaction and allele-specific combination TaqMAMA-based method. The presence of differential expression of C. arabica homeologous genes and its implications in coffee gene expression, ontology, and physiology are discussed. PMID:20864545

  15. mirEX: a platform for comparative exploration of plant pri-miRNA expression data.

    PubMed

    Bielewicz, Dawid; Dolata, Jakub; Zielezinski, Andrzej; Alaba, Sylwia; Szarzynska, Bogna; Szczesniak, Michal W; Jarmolowski, Artur; Szweykowska-Kulinska, Zofia; Karlowski, Wojciech M

    2012-01-01

    mirEX is a comprehensive platform for comparative analysis of primary microRNA expression data. RT-qPCR-based gene expression profiles are stored in a universal and expandable database scheme and wrapped by an intuitive user-friendly interface. A new way of accessing gene expression data in mirEX includes a simple mouse operated querying system and dynamic graphs for data mining analyses. In contrast to other publicly available databases, the mirEX interface allows a simultaneous comparison of expression levels between various microRNA genes in diverse organs and developmental stages. Currently, mirEX integrates information about the expression profile of 190 Arabidopsis thaliana pri-miRNAs in seven different developmental stages: seeds, seedlings and various organs of mature plants. Additionally, by providing RNA structural models, publicly available deep sequencing results, experimental procedure details and careful selection of auxiliary data in the form of web links, mirEX can function as a one-stop solution for Arabidopsis microRNA information. A web-based mirEX interface can be accessed at http://bioinfo.amu.edu.pl/mirex.

  16. Genome-wide expression profiling in pediatric septic shock

    PubMed Central

    Wong, Hector R.

    2013-01-01

    For nearly a decade, our research group has had the privilege of developing and mining a multi-center, microarray-based, genome-wide expression database of critically ill children (≤ 10 years of age) with septic shock. Using bioinformatic and systems biology approaches, the expression data generated through this discovery-oriented, exploratory approach have been leveraged for a variety of objectives, which will be reviewed. Fundamental observations include wide spread repression of gene programs corresponding to the adaptive immune system, and biologically significant differential patterns of gene expression across developmental age groups. The data have also identified gene expression-based subclasses of pediatric septic shock having clinically relevant phenotypic differences. The data have also been leveraged for the discovery of novel therapeutic targets, and for the discovery and development of novel stratification and diagnostic biomarkers. Almost a decade of genome-wide expression profiling in pediatric septic shock is now demonstrating tangible results. The studies have progressed from an initial discovery-oriented and exploratory phase, to a new phase where the data are being translated and applied to address several areas of clinical need. PMID:23329198

  17. Bacterial reference genes for gene expression studies by RT-qPCR: survey and analysis.

    PubMed

    Rocha, Danilo J P; Santos, Carolina S; Pacheco, Luis G C

    2015-09-01

    The appropriate choice of reference genes is essential for accurate normalization of gene expression data obtained by the method of reverse transcription quantitative real-time PCR (RT-qPCR). In 2009, a guideline called the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) highlighted the importance of the selection and validation of more than one suitable reference gene for obtaining reliable RT-qPCR results. Herein, we searched the recent literature in order to identify the bacterial reference genes that have been most commonly validated in gene expression studies by RT-qPCR (in the first 5 years following publication of the MIQE guidelines). Through a combination of different search parameters with the text mining tool MedlineRanker, we identified 145 unique bacterial genes that were recently tested as candidate reference genes. Of these, 45 genes were experimentally validated and, in most of the cases, their expression stabilities were verified using the software tools geNorm and NormFinder. It is noteworthy that only 10 of these reference genes had been validated in two or more of the studies evaluated. An enrichment analysis using Gene Ontology classifications demonstrated that genes belonging to the functional categories of DNA Replication (GO: 0006260) and Transcription (GO: 0006351) rendered a proportionally higher number of validated reference genes. Three genes in the former functional class were also among the top five most stable genes identified through an analysis of gene expression data obtained from the Pathosystems Resource Integration Center. These results may provide a guideline for the initial selection of candidate reference genes for RT-qPCR studies in several different bacterial species.

  18. Identification of cancer genes that are independent of dominant proliferation and lineage programs

    PubMed Central

    Selfors, Laura M.; Stover, Daniel G.; Harris, Isaac S.; Brugge, Joan S.; Coloff, Jonathan L.

    2017-01-01

    Large, multidimensional cancer datasets provide a resource that can be mined to identify candidate therapeutic targets for specific subgroups of tumors. Here, we analyzed human breast cancer data to identify transcriptional programs associated with tumors bearing specific genetic driver alterations. Using an unbiased approach, we identified thousands of genes whose expression was enriched in tumors with specific genetic alterations. However, expression of the vast majority of these genes was not enriched if associations were analyzed within individual breast tumor molecular subtypes, across multiple tumor types, or after gene expression was normalized to account for differences in proliferation or tumor lineage. Together with linear modeling results, these findings suggest that most transcriptional programs associated with specific genetic alterations in oncogenes and tumor suppressors are highly context-dependent and are predominantly linked to differences in proliferation programs between distinct breast cancer subtypes. We demonstrate that such proliferation-dependent gene expression dominates tumor transcriptional programs relative to matched normal tissues. However, we also identified a relatively small group of cancer-associated genes that are both proliferation- and lineage-independent. A subset of these genes are attractive candidate targets for combination therapy because they are essential in breast cancer cell lines, druggable, enriched in stem-like breast cancer cells, and resistant to chemotherapy-induced down-regulation. PMID:29229826

  19. Putative carotenoid genes expressed under the regulation of Shine-Dalgarno regions in Escherichia coli for efficient lycopene production.

    PubMed

    Jin, Weiyue; Xu, Xian; Jiang, Ling; Zhang, Zhidong; Li, Shuang; Huang, He

    2015-11-01

    Putative genes crtE, crtB, and crtI from Deinococcus wulumiqiensis R12, a novel species, were identified by genome mining and were co-expressed using the optimized Shine-Dalgarno (SD) regions to improve lycopene yield. A lycopene biosynthesis pathway was constructed by co-expressing these three genes in Escherichia coli. After optimizing the upstream SD regions and the culture medium, the recombinant strain EDW11 produced 88 mg lycopene g(-1) dry cell wt (780 mg lycopene l(-1)) after 40 h fermentation without IPTG induction, while the strain EDW without optimized SD regions only produced 49 mg lycopene g(-1) dry cell wt (417 mg lycopene l(-1)). Based on the optimization of the upstream SD regions and culture medium, the yield of the strain EDW11 reached a high level during microbial lycopene production until now.

  20. Differential expression and molecular characterisation of Lmo7, Myo1e, Sash1, and Mcoln2 genes in Btk-defective B-cells.

    PubMed

    Lindvall, Jessica M; Blomberg, K Emelie M; Wennborg, Anders; Smith, C I Edvard

    2005-05-01

    Bruton's tyrosine kinase is crucial for B-lymphocyte development. By the use of gene expression profiling, we have identified four expressed sequence tags among 38 potential Btk target genes, which have now been characterised. Bioinformatics tools including data mining of additional unpublished gene expression profiles, sequence verification of PCR products and qualitative RT-PCR were used. Stimulations targeting the B-cell receptor and the protein kinase C were used to activate whole B-cell splenocytes. Target genes were characterised as Lim domain only 7 (Lmo7); Myosin1e (Myo1e); SAM and SH3 domain containing 1 (Sash1); and Mucolipin2 (Mcoln2). Expression was found in cell lines of different origin and developmental stages as well as in whole B-cell splenocytes and Transitional type 1 (T1) splenic B-cells from wild type and Btk-defective mice, respectively. By the use of semi-quantitative RT-PCR we found Sash1 not to be expressed in the investigated haematopoietic cell lines, while transcripts were found in whole splenic B-cells from both wild type and Btk-defective mice, whereas Lmo7, Myo1e, and Mcoln2 were expressed in both B-cell lines and primary B-lymphocytes. Except for Lmo7, the transcript level was similarly affected by stimulation in control and Btk-defective cells.

  1. Methods for the isolation of genes encoding novel PHB cycle enzymes from complex microbial communities.

    PubMed

    Nordeste, Ricardo F; Trainer, Maria A; Charles, Trevor C

    2010-01-01

    Development of different PHAs as alternatives to petrochemically derived plastics can be facilitated by mining metagenomic libraries for diverse PHA cycle genes that might be useful for synthesis of bioplastics. The specific phenotypes associated with mutations of the PHA synthesis pathway genes in Sinorhizobium meliloti allows for the use of powerful selection and screening tools to identify complementing novel PHA synthesis genes. Identification of novel genes through their function rather than sequence facilitates finding functional proteins that may otherwise have been excluded through sequence-only screening methodology. We present here methods that we have developed for the isolation of clones expressing novel PHA metabolism genes from metagenomic libraries.

  2. Methods for the Isolation of Genes Encoding Novel PHA Metabolism Enzymes from Complex Microbial Communities.

    PubMed

    Cheng, Jiujun; Nordeste, Ricardo; Trainer, Maria A; Charles, Trevor C

    2017-01-01

    Development of different PHAs as alternatives to petrochemically derived plastics can be facilitated by mining metagenomic libraries for diverse PHA cycle genes that might be useful for synthesis of bio-plastics. The specific phenotypes associated with mutations of the PHA synthesis pathway genes in Sinorhizobium meliloti and Pseudomonas putida, allows the use of powerful selection and screening tools to identify complementing novel PHA synthesis genes. Identification of novel genes through their function rather than sequence facilitates the functional proteins that may otherwise have been excluded through sequence-only screening methodology. We present here methods that we have developed for the isolation of clones expressing novel PHA metabolism genes from metagenomic libraries.

  3. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease.

    PubMed

    Ward, Lucas D; Kellis, Manolis

    2016-01-04

    More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks

    PubMed Central

    Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295

  5. The mining of pearl formation genes in pearl oyster Pinctada fucata by cDNA suppression subtractive hybridization.

    PubMed

    Wang, Ning; Kinoshita, Shigeharu; Nomura, Naoko; Riho, Chihiro; Maeyama, Kaoru; Nagai, Kiyohito; Watabe, Shugo

    2012-04-01

    Recent researches revealed the regional preference of biomineralization gene transcription in the pearl oyster Pinctada fucata: it transcribed mainly the genes responsible for nacre secretion in mantle pallial, whereas the ones regulating calcite shells expressed in mantle edge. This study took use of this character and constructed the forward and reverse suppression subtractive hybridization (SSH) cDNA libraries. A total of 669 cDNA clones were sequenced and 360 expressed sequence tags (ESTs) greater than 100 bp were generated. Functional annotation associated 95 ESTs with specific functions, and 79 among them were identified from P. fucata at the first time. In the forward SSH cDNA library, it recognized mass amount of nacre protein genes, biomineralization genes dominantly expressed in the mantle pallial, calcium-ion-binding genes, and other biomineralization-related genes important for pearl formation. Real-time PCR showed that all the examined genes were distributed in oyster mantle tissues with a consistence to the SSH design. The detection of their RNA transcripts in pearl sac confirmed that the identified genes were certainly involved in pearl formation. Therefore, the data from this work will initiate a new round of pearl formation gene study and shed new insights into molluscan biomineralization.

  6. Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

    PubMed

    Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha

    2016-01-01

    Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.

  7. RNA-Seq analysis of yak ovary: improving yak gene structure information and mining reproduction-related genes.

    PubMed

    Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian

    2014-09-01

    RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.

  8. Cocaine alters Homer1 natural antisense transcript in the nucleus accumbens.

    PubMed

    Sartor, Gregory C; Powell, Samuel K; Velmeshev, Dmitry; Lin, David Y; Magistri, Marco; Wiedner, Hannah J; Malvezzi, Andrea M; Andrade, Nadja S; Faghihi, Mohammad A; Wahlestedt, Claes

    2017-12-01

    Natural antisense transcripts (NATs) are an abundant class of long noncoding RNAs that have recently been shown to be key regulators of chromatin dynamics and gene expression in nervous system development and neurological disorders. However, it is currently unclear if NAT-based mechanisms also play a role in drug-induced neuroadaptations. Aberrant regulation of gene expression is one critical factor underlying the long-lasting behavioral abnormalities that characterize substance use disorder, and it is possible that some drug-induced transcriptional responses are mediated, in part, by perturbations in NAT activity. To test this hypothesis, we used an automated algorithm that mines the NCBI AceView transcriptomics database to identify NAT overlapping genes linked to addiction. We found that 22% of the genes examined contain NATs and that expression of Homer1 natural antisense transcript (Homer1-AS) was altered in the nucleus accumbens (NAc) of mice 2h and 10days following repeated cocaine administration. In in vitro studies, depletion of Homer1-AS lead to an increase in the corresponding sense gene expression, indicating a potential regulatory mechanisms of Homer1 expression by its corresponding antisense transcript. Future in vivo studies are needed to definitely determine a role for Homer1-AS in cocaine-induced behavioral and molecular adaptations. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Tissue Molecular Anatomy Project (TMAP): an expression database for comparative cancer proteomics.

    PubMed

    Medjahed, Djamel; Luke, Brian T; Tontesh, Tawady S; Smythers, Gary W; Munroe, David J; Lemkin, Peter F

    2003-08-01

    By mining publicly accessible databases, we have developed a collection of tissue-specific predictive protein expression maps as a function of cancer histological state. Data analysis is applied to the differential expression of gene products in pooled libraries from the normal to the altered state(s). We wish to report the initial results of our survey across different tissues and explore the extent to which this comparative approach may help uncover panels of potential biomarkers of tumorigenesis which would warrant further examination in the laboratory.

  10. Mining TCGA Data Using Boolean Implications

    PubMed Central

    Sinha, Subarna; Tsang, Emily K.; Zeng, Haoyang; Meister, Michela; Dill, David L.

    2014-01-01

    Boolean implications (if-then rules) provide a conceptually simple, uniform and highly scalable way to find associations between pairs of random variables. In this paper, we propose to use Boolean implications to find relationships between variables of different data types (mutation, copy number alteration, DNA methylation and gene expression) from the glioblastoma (GBM) and ovarian serous cystadenoma (OV) data sets from The Cancer Genome Atlas (TCGA). We find hundreds of thousands of Boolean implications from these data sets. A direct comparison of the relationships found by Boolean implications and those found by commonly used methods for mining associations show that existing methods would miss relationships found by Boolean implications. Furthermore, many relationships exposed by Boolean implications reflect important aspects of cancer biology. Examples of our findings include cis relationships between copy number alteration, DNA methylation and expression of genes, a new hierarchy of mutations and recurrent copy number alterations, loss-of-heterozygosity of well-known tumor suppressors, and the hypermethylation phenotype associated with IDH1 mutations in GBM. The Boolean implication results used in the paper can be accessed at http://crookneck.stanford.edu/microarray/TCGANetworks/. PMID:25054200

  11. Blazing Signature Filter: a library for fast pairwise similarity comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Joon-Yong; Fujimoto, Grant M.; Wilson, Ryan

    Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is the vast majoritymore » of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.« less

  12. Regulatory Snapshots: integrative mining of regulatory modules from expression time series and regulatory networks.

    PubMed

    Gonçalves, Joana P; Aires, Ricardo S; Francisco, Alexandre P; Madeira, Sara C

    2012-01-01

    Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots.

  13. Regulatory Snapshots: Integrative Mining of Regulatory Modules from Expression Time Series and Regulatory Networks

    PubMed Central

    Gonçalves, Joana P.; Aires, Ricardo S.; Francisco, Alexandre P.; Madeira, Sara C.

    2012-01-01

    Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots. PMID:22563474

  14. An integrated -omics analysis of the epigenetic landscape of gene expression in human blood cells.

    PubMed

    Kennedy, Elizabeth M; Goehring, George N; Nichols, Michael H; Robins, Chloe; Mehta, Divya; Klengel, Torsten; Eskin, Eleazar; Smith, Alicia K; Conneely, Karen N

    2018-06-19

    Gene expression can be influenced by DNA methylation 1) distally, at regulatory elements such as enhancers, as well as 2) proximally, at promoters. Our current understanding of the influence of distal DNA methylation changes on gene expression patterns is incomplete. Here, we characterize genome-wide methylation and expression patterns for ~ 13 k genes to explore how DNA methylation interacts with gene expression, throughout the genome. We used a linear mixed model framework to assess the correlation of DNA methylation at ~ 400 k CpGs with gene expression changes at ~ 13 k transcripts in two independent datasets from human blood cells. Among CpGs at which methylation significantly associates with transcription (eCpGs), > 50% are distal (> 50 kb) or trans (different chromosome) to the correlated gene. Many eCpG-transcript pairs are consistent between studies and ~ 90% of neighboring eCpGs associate with the same gene, within studies. We find that enhancers (P < 5e-18) and microRNA genes (P = 9e-3) are overrepresented among trans eCpGs, and insulators and long intergenic non-coding RNAs are enriched among cis and distal eCpGs. Intragenic-eCpG-transcript correlations are negative in 60-70% of occurrences and are enriched for annotated gene promoters and enhancers (P < 0.002), highlighting the importance of intragenic regulation. Gene Ontology analysis indicates that trans eCpGs are enriched for transcription factor genes and chromatin modifiers, suggesting that some trans eCpGs represent the influence of gene networks and higher-order transcriptional control. This work sheds new light on the interplay between epigenetic changes and gene expression, and provides useful data for mining biologically-relevant results from epigenome-wide association studies.

  15. Protein interaction networks from literature mining

    NASA Astrophysics Data System (ADS)

    Ihara, Sigeo

    2005-03-01

    The ability to accurately predict and understand physiological changes in the biological network system in response to disease or drug therapeutics is of crucial importance in life science. The extensive amount of gene expression data generated from even a single microarray experiment often proves difficult to fully interpret and comprehend the biological significance. An increasing knowledge of protein interactions stored in the PubMed database, as well as the advancement of natural language processing, however, makes it possible to construct protein interaction networks from the gene expression information that are essential for understanding the biological meaning. From the in house literature mining system we have developed, the protein interaction network for humans was constructed. By analysis based on the graph-theoretical characterization of the total interaction network in literature, we found that the network is scale-free and semantic long-ranged interactions (i.e. inhibit, induce) between proteins dominate in the total interaction network, reducing the degree exponent. Interaction networks generated based on scientific text in which the interaction event is ambiguously described result in disconnected networks. In contrast interaction networks based on text in which the interaction events are clearly stated result in strongly connected networks. The results of protein-protein interaction networks obtained in real applications from microarray experiments are discussed: For example, comparisons of the gene expression data indicative of either a good or a poor prognosis for acute lymphoblastic leukemia with MLL rearrangements, using our system, showed newly discovered signaling cross-talk.

  16. De Novo Assembled Wheat Transcriptomes Delineate Differentially Expressed Host Genes in Response to Leaf Rust Infection.

    PubMed

    Chandra, Saket; Singh, Dharmendra; Pathak, Jyoti; Kumari, Supriya; Kumar, Manish; Poddar, Raju; Balyan, Harindra Singh; Gupta, Puspendra Kumar; Prabhu, Kumble Vinod; Mukhopadhyay, Kunal

    2016-01-01

    Pathogens like Puccinia triticina, the causal organism for leaf rust, extensively damages wheat production. The interaction at molecular level between wheat and the pathogen is complex and less explored. The pathogen induced response was characterized using mock- or pathogen inoculated near-isogenic wheat lines (with or without seedling leaf rust resistance gene Lr28). Four Serial Analysis of Gene Expression libraries were prepared from mock- and pathogen inoculated plants and were subjected to Sequencing by Oligonucleotide Ligation and Detection, which generated a total of 165,767,777 reads, each 35 bases long. The reads were processed and multiple k-mers were attempted for de novo transcript assembly; 22 k-mers showed the best results. Altogether 21,345 contigs were generated and functionally characterized by gene ontology annotation, mining for transcription factors and resistance genes. Expression analysis among the four libraries showed extensive alterations in the transcriptome in response to pathogen infection, reflecting reorganizations in major biological processes and metabolic pathways. Role of auxin in determining pathogenesis in susceptible and resistant lines were imperative. The qPCR expression study of four LRR-RLK (Leucine-rich repeat receptor-like protein kinases) genes showed higher expression at 24 hrs after inoculation with pathogen. In summary, the conceptual model of induced resistance in wheat contributes insights on defense responses and imparts knowledge of Puccinia triticina-induced defense transcripts in wheat plants.

  17. De Novo Assembled Wheat Transcriptomes Delineate Differentially Expressed Host Genes in Response to Leaf Rust Infection

    PubMed Central

    Pathak, Jyoti; Kumari, Supriya; Kumar, Manish; Poddar, Raju; Balyan, Harindra Singh; Gupta, Puspendra Kumar; Prabhu, Kumble Vinod; Mukhopadhyay, Kunal

    2016-01-01

    Pathogens like Puccinia triticina, the causal organism for leaf rust, extensively damages wheat production. The interaction at molecular level between wheat and the pathogen is complex and less explored. The pathogen induced response was characterized using mock- or pathogen inoculated near-isogenic wheat lines (with or without seedling leaf rust resistance gene Lr28). Four Serial Analysis of Gene Expression libraries were prepared from mock- and pathogen inoculated plants and were subjected to Sequencing by Oligonucleotide Ligation and Detection, which generated a total of 165,767,777 reads, each 35 bases long. The reads were processed and multiple k-mers were attempted for de novo transcript assembly; 22 k-mers showed the best results. Altogether 21,345 contigs were generated and functionally characterized by gene ontology annotation, mining for transcription factors and resistance genes. Expression analysis among the four libraries showed extensive alterations in the transcriptome in response to pathogen infection, reflecting reorganizations in major biological processes and metabolic pathways. Role of auxin in determining pathogenesis in susceptible and resistant lines were imperative. The qPCR expression study of four LRR-RLK (Leucine-rich repeat receptor-like protein kinases) genes showed higher expression at 24 hrs after inoculation with pathogen. In summary, the conceptual model of induced resistance in wheat contributes insights on defense responses and imparts knowledge of Puccinia triticina-induced defense transcripts in wheat plants. PMID:26840746

  18. Mining the archives: a cross-platform analysis of gene ...

    EPA Pesticide Factsheets

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc

  19. Transcriptomic Analysis and the Expression of Disease-Resistant Genes in Oryza meyeriana under Native Condition

    PubMed Central

    He, Bin; Tao, Xiang; Gu, Yinghong; Wei, Changhe; Cheng, Xiaojie; Xiao, Suqin; Cheng, Zaiquan; Zhang, Yizheng

    2015-01-01

    Oryza meyeriana (O. meyeriana), with a GG genome type (2n = 24), accumulated plentiful excellent characteristics with respect to resistance to many diseases such as rice shade and blast, even immunity to bacterial blight. It is very important to know if the diseases-resistant genes exist and express in this wild rice under native conditions. However, limited genomic or transcriptomic data of O. meyeriana are currently available. In this study, we present the first comprehensive characterization of the O. meyeriana transcriptome using RNA-seq and obtained 185,323 contigs with an average length of 1,692 bp and an N50 of 2,391 bp. Through differential expression analysis, it was found that there were most tissue-specifically expressed genes in roots, and next to stems and leaves. By similarity search against protein databases, 146,450 had at least a significant alignment to existed gene models. Comparison with the Oryza sativa (japonica-type Nipponbare and indica-type 93–11) genomes revealed that 13% of the O. meyeriana contigs had not been detected in O. sativa. Many diseases-resistant genes, such as bacterial blight resistant, blast resistant, rust resistant, fusarium resistant, cyst nematode resistant and downy mildew gene, were mined from the transcriptomic database. There are two kinds of rice bacterial blight-resistant genes (Xa1 and Xa26) differentially or specifically expressed in O. meyeriana. The 4 Xa1 contigs were all only expressed in root, while three of Xa26 contigs have the highest expression level in leaves, two of Xa26 contigs have the highest expression profile in stems and one of Xa26 contigs was expressed dominantly in roots. The transcriptomic database of O. meyeriana has been constructed and many diseases-resistant genes were found to express under native condition, which provides a foundation for future discovery of a number of novel genes and provides a basis for studying the molecular mechanisms associated with disease resistance in O. meyeriana. PMID:26640944

  20. Prognostic significance of DSG3 in rectal adenocarcinoma treated with preoperative chemoradiotherapy.

    PubMed

    Chao, Tung-Bo; Li, Chien-Feng; Lin, Ching-Yih; Tian, Yu-Feng; Chang, I-Wei; Sheu, Ming-Jen; Lee, Ying-En; Chan, Ti-Chun; He, Hong-Lin

    2016-06-01

    This study aimed to investigate the prognostic significance of DSG3 and its association with response to neoadjuvant concurrent chemoradiotherapy (CCRT) in rectal cancer. Data mining of a publicly available dataset was performed to find genes associated with CCRT response. Immunohistochemistry was applied to evaluate DSG3 expression. The relationships between DSG3 expression and various clinicopathological parameters and survival were analyzed. The DSG3 gene was significantly associated with CCRT response. The expression of DSG3 negatively correlated with poorer tumor regression (p < 0.001) and had an independent negative impact on disease-specific survival (p = 0.011), local recurrence-free survival (p = 0.031) and metastasis-free survival (p = 0.029). DSG3 was a key prognostic factor and predictor for CCRT response in rectal cancer patients.

  1. Re-analysis of RNA-seq transcriptome data reveals new aspects of gene activity in Arabidopsis root hairs

    PubMed Central

    Li, Wenfeng; Lan, Ping

    2015-01-01

    Root hairs, tubular-shaped outgrowths from root epidermal cells, play important roles in the acquisition of nutrients and water, interaction with microbe, and in plant anchorage. As a specialized cell type, root hairs, especially in Arabidopsis, provide a pragmatic research system for various aspects of studies. Here, we re-analyzed the RNA-seq transcriptome profile of Arabidopsis root hair cells by Tophat software and used Cufflinks program to mine the differentially expressed genes. Results showed that ERD14, RIN4, AT5G64401 were among the most abundant genes in the root hair cells; while ATGSTU2, AT5G54940, AT4G30530 were highly expressed in non-root hair tissues. In total, 5409 genes, with a fold change greater than two-fold (FDR adjusted P < 0.05), showed differential expression between root hair cells and non-root hair tissues. Of which, 61 were expressed only in root hair cells. One hundred and thirty-six out of 5409 genes have been reported to be “core” root epidermal genes, which could be grouped into nine clusters according to expression patterns. Gene ontology (GO) analysis of the 5409 genes showed that processes of “response to salt stress,” “ribosome biogenesis,” “protein phosphorylation,” and “response to water deprivation” were enriched. Whereas only process of “intracellular signal transduction” was enriched in the subset of 61 genes expressed only in the root hair cells. One hundred and twenty-one unannotated transcripts were identified and 14 of which were shown to be differentially expressed between root hair cells and non-root hair tissues, with transcripts XLOC_000763, XLOC_031361, and XLOC_005665 being highly expressed in the root hair cells. The comprehensive transcriptomic analysis provides new information on root hair gene activity and sets the stage for follow-up experiments to certify the biological functions of the newly identified genes and novel transcripts in root hair cell morphogenesis. PMID:26106402

  2. Microarray and comparative genomics-based identification of genes and gene regulatory regions of the mouse immune system

    PubMed Central

    Hutton, John J; Jegga, Anil G; Kong, Sue; Gupta, Ashima; Ebert, Catherine; Williams, Sarah; Katz, Jonathan D; Aronow, Bruce J

    2004-01-01

    Background In this study we have built and mined a gene expression database composed of 65 diverse mouse tissues for genes preferentially expressed in immune tissues and cell types. Using expression pattern criteria, we identified 360 genes with preferential expression in thymus, spleen, peripheral blood mononuclear cells, lymph nodes (unstimulated or stimulated), or in vitro activated T-cells. Results Gene clusters, formed based on similarity of expression-pattern across either all tissues or the immune tissues only, had highly significant associations both with immunological processes such as chemokine-mediated response, antigen processing, receptor-related signal transduction, and transcriptional regulation, and also with more general processes such as replication and cell cycle control. Within-cluster gene correlations implicated known associations of known genes, as well as immune process-related roles for poorly described genes. To characterize regulatory mechanisms and cis-elements of genes with similar patterns of expression, we used a new version of a comparative genomics-based cis-element analysis tool to identify clusters of cis-elements with compositional similarity among multiple genes. Several clusters contained genes that shared 5–6 cis-elements that included ETS and zinc-finger binding sites. cis-Elements AP2 EGRF ETSF MAZF SP1F ZF5F and AREB ETSF MZF1 PAX5 STAT were shared in a thymus-expressed set; AP4R E2FF EBOX ETSF MAZF SP1F ZF5F and CREB E2FF MAZF PCAT SP1F STAT cis-clusters occurred in activated T-cells; CEBP CREB NFKB SORY and GATA NKXH OCT1 RBIT occurred in stimulated lymph nodes. Conclusion This study demonstrates a series of analytic approaches that have allowed the implication of genes and regulatory elements that participate in the differentiation, maintenance, and function of the immune system. Polymorphism or mutation of these could adversely impact immune system functions. PMID:15504237

  3. Functional Analyses of NSF1 in Wine Yeast Using Interconnected Correlation Clustering and Molecular Analyses

    PubMed Central

    Bessonov, Kyrylo; Walkey, Christopher J.; Shelp, Barry J.; van Vuuren, Hennie J. J.; Chiu, David; van der Merwe, George

    2013-01-01

    Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples. PMID:24130853

  4. Selection of reference genes for quantitative gene expression normalization in flax (Linum usitatissimum L.).

    PubMed

    Huis, Rudy; Hawkins, Simon; Neutelings, Godfrey

    2010-04-19

    Quantitative real-time PCR (qRT-PCR) is currently the most accurate method for detecting differential gene expression. Such an approach depends on the identification of uniformly expressed 'housekeeping genes' (HKGs). Extensive transcriptomic data mining and experimental validation in different model plants have shown that the reliability of these endogenous controls can be influenced by the plant species, growth conditions and organs/tissues examined. It is therefore important to identify the best reference genes to use in each biological system before using qRT-PCR to investigate differential gene expression. In this paper we evaluate different candidate HKGs for developmental transcriptomic studies in the economically-important flax fiber- and oil-crop (Linum usitatissimum L). Specific primers were designed in order to quantify the expression levels of 20 different potential housekeeping genes in flax roots, internal- and external-stem tissues, leaves and flowers at different developmental stages. After calculations of PCR efficiencies, 13 HKGs were retained and their expression stabilities evaluated by the computer algorithms geNorm and NormFinder. According to geNorm, 2 Transcriptional Elongation Factors (TEFs) and 1 Ubiquitin gene are necessary for normalizing gene expression when all studied samples are considered. However, only 2 TEFs are required for normalizing expression in stem tissues. In contrast, NormFinder identified glyceraldehyde-3-phosphate dehydrogenase (GADPH) as the most stably expressed gene when all samples were grouped together, as well as when samples were classed into different sub-groups.qRT-PCR was then used to investigate the relative expression levels of two splice variants of the flax LuMYB1 gene (homologue of AtMYB59). LuMYB1-1 and LuMYB1-2 were highly expressed in the internal stem tissues as compared to outer stem tissues and other samples. This result was confirmed with both geNorm-designated- and NormFinder-designated-reference genes. The use of 2 different statistical algorithms results in the identification of different combinations of flax HKGs for expression data normalization. Despite such differences, the use of geNorm-designated- and NormFinder-designated-reference genes enabled us to accurately compare the expression levels of a flax MYB gene in different organs and tissues. Our identification and validation of suitable flax HKGs will facilitate future developmental transcriptomic studies in this economically-important plant.

  5. Mining featured biomarkers associated with prostatic carcinoma based on bioinformatics.

    PubMed

    Piao, Guanying; Wu, Jiarui

    2013-11-01

    To analyze the differentially expressed genes and identify featured biomarkers from prostatic carcinoma. The software "Significance Analysis of Microarray" (SAM) was used to identify the differentially coexpressed genes (DCGs). The DCGs existed in two datasets were analyzed by GO (Gene Ontology) functional annotation. A total of 389 DCGs were obtained. By GO analysis, we found these DCGs were closely related with the acinus development, TGF-β receptor and signal transduction pathways. Furthermore, five featured biomarkers were discovered by interaction analysis. These important signal pathways and oncogenes may provide potential therapeutic targets for prostatic carcinoma.

  6. EgoNet: identification of human disease ego-network modules

    PubMed Central

    2014-01-01

    Background Mining novel biomarkers from gene expression profiles for accurate disease classification is challenging due to small sample size and high noise in gene expression measurements. Several studies have proposed integrated analyses of microarray data and protein-protein interaction (PPI) networks to find diagnostic subnetwork markers. However, the neighborhood relationship among network member genes has not been fully considered by those methods, leaving many potential gene markers unidentified. The main idea of this study is to take full advantage of the biological observation that genes associated with the same or similar diseases commonly reside in the same neighborhood of molecular networks. Results We present EgoNet, a novel method based on egocentric network-analysis techniques, to exhaustively search and prioritize disease subnetworks and gene markers from a large-scale biological network. When applied to a triple-negative breast cancer (TNBC) microarray dataset, the top selected modules contain both known gene markers in TNBC and novel candidates, such as RAD51 and DOK1, which play a central role in their respective ego-networks by connecting many differentially expressed genes. Conclusions Our results suggest that EgoNet, which is based on the ego network concept, allows the identification of novel biomarkers and provides a deeper understanding of their roles in complex diseases. PMID:24773628

  7. Functional genomic responses to cystic fibrosis transmembrane conductance regulator (CFTR) and CFTR(delta508) in the lung.

    PubMed

    Xu, Yan; Liu, Cong; Clark, Jean C; Whitsett, Jeffrey A

    2006-04-21

    Cystic fibrosis (CF), a common lethal pulmonary disorder in Caucasians, is caused by mutations in the cystic fibrosis transmembrane conductance regulator gene (CFTR) that disturbs fluid homeostasis and host defense in target organs. The effects of CFTR and delta508-CFTR were assessed in transgenic mice that 1) lack CFTR expression (Cftr-/-); 2) express the human delta508 CFTR (CFTR(delta508)); 3) overexpress the normal human CFTR (CFTR(tg)) in respiratory epithelial cells. Genes were selected from Affymetrix Murine Gene-Chips analysis and subjected to functional classification, k-means clustering, promoter cis-elements/modules searching, literature mining, and pathway exploring. Genomic responses to Cftr-/- were not corrected by expression of CFTR(delta508). Genes regulating host defense, inflammation, fluid and electrolyte transport were similarly altered in Cftr-/- and CFTR(delta508) mice. CFTR(delta508) induced a primary disturbance in expression of genes regulating redox and antioxidant systems. Genomic responses to CFTR(tg) were modest and were not associated with lung pathology. CFTR(tg) and CFTR(delta508) induced genes encoding heat shock proteins and other chaperones but did not activate the endoplasmic reticulum-associated degradation pathway. RNAs encoding proteins that directly interact with CFTR were identified in each of the CFTR mouse models, supporting the hypothesis that CFTR functions within a multiprotein complex whose members interact at the level of protein-protein interactions and gene expression. Promoters of genes influenced by CFTR shared common regulatory elements, suggesting that their co-expression may be mediated by shared regulatory mechanisms. Genes and pathways involved in the response to CFTR may be of interest as modifiers of CF.

  8. Mining disease state converters for medical intervention of diseases.

    PubMed

    Dong, Guozhu; Duan, Lei; Tang, Changjie

    2010-02-01

    In applications such as gene therapy and drug design, a key goal is to convert the disease state of diseased objects from an undesirable state into a desirable one. Such conversions may be achieved by changing the values of some attributes of the objects. For example, in gene therapy one may convert cancerous cells to normal ones by changing some genes' expression level from low to high or from high to low. In this paper, we define the disease state conversion problem as the discovery of disease state converters; a disease state converter is a small set of attribute value changes that may change an object's disease state from undesirable into desirable. We consider two variants of this problem: personalized disease state converter mining mines disease state converters for a given individual patient with a given disease, and universal disease state converter mining mines disease state converters for all samples with a given disease. We propose a DSCMiner algorithm to discover small and highly effective disease state converters. Since real-life medical experiments on living diseased instances are expensive and time consuming, we use classifiers trained from the datasets of given diseases to evaluate the quality of discovered converter sets. The effectiveness of a disease state converter is measured by the percentage of objects that are successfully converted from undesirable state into desirable state as deemed by state-of-the-art classifiers. We use experiments to evaluate the effectiveness of our algorithm and to show its effectiveness. We also discuss possible research directions for extensions and improvements. We note that the disease state conversion problem also has applications in customer retention, criminal rehabilitation, and company turn-around, where the goal is to convert class membership of objects whose class is an undesirable class.

  9. Release of (and lessons learned from mining) a pioneering large toxicogenomics database.

    PubMed

    Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R

    2015-07-01

    We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.

  10. Transcriptomic Analysis of the Claudin Interactome in Malignant Pleural Mesothelioma: Evaluation of the Effect of Disease Phenotype, Asbestos Exposure, and CDKN2A Deletion Status

    PubMed Central

    Rouka, Erasmia; Vavougios, Georgios D.; Solenov, Evgeniy I.; Gourgoulianis, Konstantinos I.; Hatzoglou, Chrissi; Zarogiannis, Sotirios G.

    2017-01-01

    Malignant pleural mesothelioma (MPM) is a highly aggressive tumor primarily associated with asbestos exposure. Early detection of MPM is restricted by the long latency period until clinical presentation, the ineffectiveness of imaging techniques in early stage detection and the lack of non-invasive biomarkers with high sensitivity and specificity. In this study we used transcriptome data mining in order to determine which CLAUDIN (CLDN) genes are differentially expressed in MPM as compared to controls. Using the same approach we identified the interactome of the differentially expressed CLDN genes and assessed their expression profile. Subsequently, we evaluated the effect of tumor histology, asbestos exposure, CDKN2A deletion status, and gender on the gene expression level of the claudin interactome. We found that 5 out of 15 studied CLDNs (4, 5, 8, 10, 15) and 4 out of 27 available interactors (S100B, SHBG, CDH5, CXCL8) were differentially expressed in MPM specimens vs. healthy tissues. The genes encoding the CLDN-15 and S100B proteins present differences in their expression profile between the three histological subtypes of MPM. Moreover, CLDN-15 is significantly under-expressed in the cohort of patients with previous history of asbestos exposure. CLDN-15 was also found significantly underexpressed in patients lacking the CDKN2A gene. These results warrant the detailed in vitro investigation of the role of CDLN-15 in the pathobiology of MPM. PMID:28377727

  11. Transcriptomic Analysis of the Claudin Interactome in Malignant Pleural Mesothelioma: Evaluation of the Effect of Disease Phenotype, Asbestos Exposure, and CDKN2A Deletion Status.

    PubMed

    Rouka, Erasmia; Vavougios, Georgios D; Solenov, Evgeniy I; Gourgoulianis, Konstantinos I; Hatzoglou, Chrissi; Zarogiannis, Sotirios G

    2017-01-01

    Malignant pleural mesothelioma (MPM) is a highly aggressive tumor primarily associated with asbestos exposure. Early detection of MPM is restricted by the long latency period until clinical presentation, the ineffectiveness of imaging techniques in early stage detection and the lack of non-invasive biomarkers with high sensitivity and specificity. In this study we used transcriptome data mining in order to determine which CLAUDIN (CLDN) genes are differentially expressed in MPM as compared to controls. Using the same approach we identified the interactome of the differentially expressed CLDN genes and assessed their expression profile. Subsequently, we evaluated the effect of tumor histology, asbestos exposure, CDKN2A deletion status, and gender on the gene expression level of the claudin interactome. We found that 5 out of 15 studied CLDNs ( 4, 5, 8, 10, 15 ) and 4 out of 27 available interactors ( S100B, SHBG, CDH5, CXCL8 ) were differentially expressed in MPM specimens vs. healthy tissues. The genes encoding the CLDN-15 and S100B proteins present differences in their expression profile between the three histological subtypes of MPM. Moreover, CLDN-15 is significantly under-expressed in the cohort of patients with previous history of asbestos exposure. CLDN-15 was also found significantly underexpressed in patients lacking the CDKN2A gene. These results warrant the detailed in vitro investigation of the role of CDLN-15 in the pathobiology of MPM.

  12. Identification and expression profiles of fifteen delta-class glutathione S-transferase genes from a stored-product pest, Liposcelis entomophila (Enderlein) (Psocoptera: Liposcelididae).

    PubMed

    Jing, Tian-Xing; Wu, Yu-Xian; Li, Ting; Wei, Dan-Dan; Smagghe, Guy; Wang, Jin-Jun

    2017-04-01

    Glutathione S-transferases (GSTs) comprise a diverse family of enzymes found ubiquitously in aerobic organisms and they play important roles in insecticide resistance. In this study, we tested the sensitivities of Liposcelis entomophila, collected from four different field populations, to three insecticides. The results showed that the insects from Tongliang population had a relatively higher tolerance to malathion and propuxor than insects from other field populations. The insecticide sensitivities of different populations detected in psocids may be due to the different control practices. Through sequence mining and phylogenetic analyses, we identified 15 delta class GST genes that contained the conserved motifs of the GSTs. Quantitative real-time PCR (Q-PCR) analysis indicated that the 15 GST genes were expressed at all tested developmental stages, and 12 GST genes had significantly higher expression levels in adulthood than in egg stage. The expression levels of 15 GST genes in different field populations showed that 9 GST genes were significantly higher in Tongliang population compared to other populations. Furthermore, Q-PCR confirmed that the expression of several delta class GSTs was upregulated at different times after malathion, propuxor and deltamethrine exposure with the LC 50 concentration of insecticide. Taken together, these findings showed that delta class GST genes have various expression levels in different developmental stages and different field populations, and they were up-regulated in response to insecticide exposure, which suggested that these GSTs may be associated with insecticide metabolism in psocids. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. The Eucalyptus terpene synthase gene family.

    PubMed

    Külheim, Carsten; Padovan, Amanda; Hefer, Charles; Krause, Sandra T; Köllner, Tobias G; Myburg, Alexander A; Degenhardt, Jörg; Foley, William J

    2015-06-11

    Terpenoids are abundant in the foliage of Eucalyptus, providing the characteristic smell as well as being valuable economically and influencing ecological interactions. Quantitative and qualitative inter- and intra- specific variation of terpenes is common in eucalypts. The genome sequences of Eucalyptus grandis and E. globulus were mined for terpene synthase genes (TPS) and compared to other plant species. We investigated the relative expression of TPS in seven plant tissues and functionally characterized five TPS genes from E. grandis. Compared to other sequenced plant genomes, Eucalyptus grandis has the largest number of putative functional TPS genes of any sequenced plant. We discovered 113 and 106 putative functional TPS genes in E. grandis and E. globulus, respectively. All but one TPS from E. grandis were expressed in at least one of seven plant tissues examined. Genomic clusters of up to 20 genes were identified. Many TPS are expressed in tissues other than leaves which invites a re-evaluation of the function of terpenes in Eucalyptus. Our data indicate that terpenes in Eucalyptus may play a wider role in biotic and abiotic interactions than previously thought. Tissue specific expression is common and the possibility of stress induction needs further investigation. Phylogenetic comparison of the two investigated Eucalyptus species gives insight about recent evolution of different clades within the TPS gene family. While the majority of TPS genes occur in orthologous pairs some clades show evidence of recent gene duplication, as well as loss of function.

  14. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  15. Transcriptome-wide mining suggests conglomerate of genes associated with tuberous root growth and development in Aconitum heterophyllum Wall.

    PubMed

    Malhotra, Nikhil; Sood, Hemant; Chauhan, Rajinder Singh

    2016-12-01

    Tuberous roots of Aconitum heterophyllum constitute storage organ for secondary metabolites, however, molecular components contributing to their formation are not known. The transcriptomes of A. heterophyllum were analyzed to identify possible genes associated with tuberous root development by taking clues from genes implicated in other plant species. Out of 18 genes, eight genes encoding GDP-mannose pyrophosphorylase (GMPase), SHAGGY, Expansin, RING-box protein 1 (RBX1), SRF receptor kinase (SRF), β-amylase, ADP-glucose pyrophosphorylase (AGPase) and Auxin responsive factor 2 (ARF2) showed higher transcript abundance in roots (13-171 folds) compared to shoots. Comparative expression analysis of those genes between tuberous root developmental stages showed 11-97 folds increase in transcripts in fully developed roots compared to young rootlets, thereby implying their association in biosynthesis, accumulation and storage of primary metabolites towards root biomass. Cluster analysis revealed a positive correlation with the gene expression data for different stages of tuberous root formation in A. heterophyllum. The outcome of this study can be useful in genetic improvement of A. heterophyllum for root biomass yield.

  16. NvERTx: a gene expression database to compare embryogenesis and regeneration in the sea anemone Nematostella vectensis.

    PubMed

    Warner, Jacob F; Guerlais, Vincent; Amiel, Aldine R; Johnston, Hereroa; Nedoncelle, Karine; Röttinger, Eric

    2018-05-17

    For over a century, researchers have been comparing embryogenesis and regeneration hoping that lessons learned from embryonic development will unlock hidden regenerative potential. This problem has historically been a difficult one to investigate because the best regenerative model systems are poor embryonic models and vice versa. Recently, however, there has been renewed interest in this question, as emerging models have allowed researchers to investigate these processes in the same organism. This interest has been further fueled by the advent of high-throughput transcriptomic analyses that provide virtual mountains of data. Here, we present N ematostella vectensis Embryogenesis and Regeneration Transcriptomics (NvERTx), a platform for comparing gene expression during embryogenesis and regeneration. NvERTx consists of close to 50 transcriptomic data sets spanning embryogenesis and regeneration in Nematostella These data were used to perform a robust de novo transcriptome assembly, with which users can search, conduct BLAST analyses, and plot the expression of multiple genes during these two developmental processes. The site is also home to the results of gene clustering analyses, to further mine the data and identify groups of co-expressed genes. The site can be accessed at http://nvertx.kahikai.org. © 2018. Published by The Company of Biologists Ltd.

  17. Directed module detection in a large-scale expression compendium.

    PubMed

    Fu, Qiang; Lemmens, Karen; Sanchez-Rodriguez, Aminael; Thijs, Inge M; Meysman, Pieter; Sun, Hong; Fierro, Ana Carolina; Engelen, Kristof; Marchal, Kathleen

    2012-01-01

    Public online microarray databases contain tremendous amounts of expression data. Mining these data sources can provide a wealth of information on the underlying transcriptional networks. In this chapter, we illustrate how the web services COLOMBOS and DISTILLER can be used to identify condition-dependent coexpression modules by exploring compendia of public expression data. COLOMBOS is designed for user-specified query-driven analysis, whereas DISTILLER generates a global regulatory network overview. The user is guided through both web services by means of a case study in which condition-dependent coexpression modules comprising a gene of interest (i.e., "directed") are identified.

  18. Transcriptional Analysis of Drought-Induced Genes in the Roots of a Tolerant Genotype of the Common Bean (Phaseolus vulgaris L.)

    PubMed Central

    Recchia, Gustavo Henrique; Caldas, Danielle Gregorio Gomes; Beraldo, Ana Luiza Ahern; da Silva, Márcio José; Tsai, Siu Mui

    2013-01-01

    In Brazil, common bean (Phaseolus vulgaris L.) productivity is severely affected by drought stress due to low technology cultivation systems. Our purpose was to identify differentially expressed genes in roots of a genotype tolerant to water deficit (BAT 477) when submitted to an interruption of irrigation during its development. A SSH library was constructed taking as “driver” the genotype Carioca 80SH (susceptible to drought). After clustering and data mining, 1572 valid reads were obtained, resulting in 1120 ESTs (expressed sequence tags). We found sequences for transcription factors, carbohydrates metabolism, proline-rich proteins, aquaporins, chaperones and ubiquitins, all of them organized according to their biological processes. Our suppressive subtractive hybridization (SSH) library was validated through RT-qPCR experiment by assessing the expression patterns of 10 selected genes in both genotypes under stressed and control conditions. Finally, the expression patterns of 31 ESTs, putatively related to drought responses, were analyzed in a time-course experiment. Our results confirmed that such genes are more expressed in the tolerant genotype during stress; however, they are not exclusive, since different levels of these transcripts were also detected in the susceptible genotype. In addition, we observed a fluctuation in gene regulation over time for both the genotypes, which seem to adopt and adapt different strategies in order to develop tolerance against this stress. PMID:23538843

  19. Identifying candidate drivers of drug response in heterogeneous cancer by mining high throughput genomics data.

    PubMed

    Nabavi, Sheida

    2016-08-15

    With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues. To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies. The described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.

  20. Microarray labeling extension values: laboratory signatures for Affymetrix GeneChips

    PubMed Central

    Lee, Yun-Shien; Chen, Chun-Houh; Tsai, Chi-Neu; Tsai, Chia-Lung; Chao, Angel; Wang, Tzu-Hao

    2009-01-01

    Interlaboratory comparison of microarray data, even when using the same platform, imposes several challenges to scientists. RNA quality, RNA labeling efficiency, hybridization procedures and data-mining tools can all contribute variations in each laboratory. In Affymetrix GeneChips, about 11–20 different 25-mer oligonucleotides are used to measure the level of each transcript. Here, we report that ‘labeling extension values (LEVs)’, which are correlation coefficients between probe intensities and probe positions, are highly correlated with the gene expression levels (GEVs) on eukayotic Affymetrix microarray data. By analyzing LEVs and GEVs in the publicly available 2414 cel files of 20 Affymetrix microarray types covering 13 species, we found that correlations between LEVs and GEVs only exist in eukaryotic RNAs, but not in prokaryotic ones. Surprisingly, Affymetrix results of the same specimens that were analyzed in different laboratories could be clearly differentiated only by LEVs, leading to the identification of ‘laboratory signatures’. In the examined dataset, GSE10797, filtering out high-LEV genes did not compromise the discovery of biological processes that are constructed by differentially expressed genes. In conclusion, LEVs provide a new filtering parameter for microarray analysis of gene expression and it may improve the inter- and intralaboratory comparability of Affymetrix GeneChips data. PMID:19295132

  1. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

    PubMed

    Mallik, Saurav; Zhao, Zhongming

    2017-12-28

    For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  2. NCBI GEO: archive for high-throughput functional genomic data.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Edgar, Ron

    2009-01-01

    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

  3. Biomining active cellulases from a mining bioremediation system.

    PubMed

    Mewis, Keith; Armstrong, Zachary; Song, Young C; Baldwin, Susan A; Withers, Stephen G; Hallam, Steven J

    2013-09-20

    Functional metagenomics has emerged as a powerful method for gene model validation and enzyme discovery from natural and human engineered ecosystems. Here we report development of a high-throughput functional metagenomic screen incorporating bioinformatic and biochemical analyses features. A fosmid library containing 6144 clones sourced from a mining bioremediation system was screened for cellulase activity using 2,4-dinitrophenyl β-cellobioside, a previously proven cellulose model substrate. Fifteen active clones were recovered and fully sequenced revealing 9 unique clones with the ability to hydrolyse 1,4-β-D-glucosidic linkages. Transposon mutagenesis identified genes belonging to glycoside hydrolase (GH) 1, 3, or 5 as necessary for mediating this activity. Reference trees for GH 1, 3, and 5 families were generated from sequences in the CAZy database for automated phylogenetic analysis of fosmid end and active clone sequences revealing known and novel cellulase encoding genes. Active cellulase genes recovered in functional screens were subcloned into inducible high copy plasmids, expressed and purified to determine enzymatic properties including thermostability, pH optima, and substrate specificity. The workflow described here provides a general paradigm for recovery and characterization of microbially derived genes and gene products based on genetic logic and contemporary screening technologies developed for model organismal systems. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Integration of zebrafish fin regeneration genes with expression data of human tumors in silico uncovers potential novel melanoma markers.

    PubMed

    Hagedorn, Martin; Siegfried, Géraldine; Hooks, Katarzyna B; Khatib, Abdel-Majid

    2016-11-01

    Tissue regeneration requires expression of a large, unknown number of genes to initiate and maintain cellular processes such as proliferation, extracellular matrix synthesis, differentiation and migration. A unique model to simulate this process in a controlled manner is the re-growth of the caudal fin of zebrafish after amputation. Within this tissue stem cells differentiate into fibroblasts, epithelial and endothelial cells as well as melanocytes. Many genes implicated in the regeneration process are deregulated in cancer. We therefore undertook a systematic gene expression study to identify genes upregulated during the re-growth of caudal fin tissue. By applying a high stringency cut-off value of 4-fold change, we identified 54 annotated genes significantly overexpressed in regenerating blastema. Further bioinformatics data mining studies showed that 22 out of the 54 regeneration genes where overexpressed in melanoma compared to normal skin or other cancers. Whereas the role of TNC (tenascin C) and FN1 (fibronectin 1) in melanoma development is well documented, implication of MARCKS, RCN3, BAMBI, PEA3/ETV4 and the FK506 family members FKBP7, FKBP10 and FKBP11 in melanoma progression is unclear. Corresponding proteins were detected in melanoma tissue but not in normal skin. High expression of FKBP7, DPYSL5 and MDK was significantly associated with poor survival. We discuss a potential role of these novel melanoma genes, which have promising potential as new therapeutic targets or diagnostic markers.

  5. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

  6. Temporal, spatial, and phenotypical changes of PDGFRα expressing fibroblasts during late lung development.

    PubMed

    Endale, Mehari; Ahlfeld, Shawn; Bao, Erik; Chen, Xiaoting; Green, Jenna; Bess, Zach; Weirauch, Matthew T; Xu, Yan; Perl, Anne Karina

    2017-05-15

    Many studies have investigated the source and role of epithelial progenitors during lung development; such information is limited for fibroblast populations and their complex role in the developing lung. In this study, we characterized the spatial location, mRNA expression and Immunophenotyping of PDGFRα + fibroblasts during sacculation and alveolarization. Confocal microscopy identified spatial association of PDGFRα expressing fibroblasts with proximal epithelial cells of the branching bronchioles and the dilating acinar tubules at E16.5; with distal terminal saccules at E18.5; and with alveolar epithelial cells at PN7 and PN28. Immunohistochemistry for alpha smooth muscle actin revealed that PDGFRα + fibroblasts contribute to proximal peribronchiolar smooth muscle at E16.5 and to transient distal alveolar myofibroblasts at PN7. Time series RNA-Seq analyses of PDGFRα + fibroblasts identified differentially expressed genes that, based on gene expression similarity were clustered into 7 major gene expression profile patterns. The presence of myofibroblast and smooth muscle precursors at E16.5 and PN7 was reflected by a two-peak gene expression profile on these days and gene ontology enrichment in muscle contraction. Additional molecular and functional differences between peribronchiolar smooth muscle cells at E16.5 and transient intraseptal myofibroblasts at PN7 were suggested by a single peak in gene expression at PN7 with functional enrichment in cell projection and muscle cell differentiation. Immunophenotyping of subsets of PDGFRα + fibroblasts by flow cytometry confirmed the predicted increase in proliferation at E16.5 and PN7, and identified subsets of CD29 + myofibroblasts and CD34 + lipofibroblasts. These data can be further mined to develop novel hypotheses and valuable understanding of the molecular and cellular basis of alveolarization. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Temporal, spatial, and phenotypical changes of PDGFRα expressing fibroblasts during late lung development☆

    PubMed Central

    Endale, Mehari; Ahlfeld, Shawn; Bao, Erik; Chen, Xiaoting; Green, Jenna; Bess, Zach; Weirauch, Matthew T.; Xu, Yan; Perl, Anne Karina

    2017-01-01

    Many studies have investigated the source and role of epithelial progenitors during lung development; such information is limited for fibroblast populations and their complex role in the developing lung. In this study, we characterized the spatial location, mRNA expression and Immunophenotyping of PDGFRα+ fibroblasts during sacculation and alveolarization. Confocal microscopy identified spatial association of PDGFRα expressing fibroblasts with proximal epithelial cells of the branching bronchioles and the dilating acinar tubules at E16.5; with distal terminal saccules at E18.5; and with alveolar epithelial cells at PN7 and PN28. Immunohistochemistry for alpha smooth muscle actin revealed that PDGFRα+ fibroblasts contribute to proximal peribronchiolar smooth muscle at E16.5 and to transient distal alveolar myofibroblasts at PN7. Time series RNA-Seq analyses of PDGFRα+ fibroblasts identified differentially expressed genes that, based on gene expression similarity were clustered into 7 major gene expression profile patterns. The presence of myofibroblast and smooth muscle precursors at E16.5 and PN7 was reflected by a two-peak gene expression profile on these days and gene ontology enrichment in muscle contraction. Additional molecular and functional differences between peribronchiolar smooth muscle cells at E16.5 and transient intraseptal myofibroblasts at PN7 were suggested by a single peak in gene expression at PN7 with functional enrichment in cell projection and muscle cell differentiation. Immunophenotyping of subsets of PDGFRα+ fibroblasts by flow cytometry confirmed the predicted increase in proliferation at E16.5 and PN7, and identified subsets of CD29+ myofibroblasts and CD34+ lipofibroblasts. These data can be further mined to develop novel hypotheses and valuable understanding of the molecular and cellular basis of alveolarization. PMID:28408205

  8. Heterologous Production of a Novel Cyclic Peptide Compound, KK-1, in Aspergillus oryzae.

    PubMed

    Yoshimi, Akira; Yamaguchi, Sigenari; Fujioka, Tomonori; Kawai, Kiyoshi; Gomi, Katsuya; Machida, Masayuki; Abe, Keietsu

    2018-01-01

    A novel cyclic peptide compound, KK-1, was originally isolated from the plant-pathogenic fungus Curvularia clavata . It consists of 10 amino acid residues, including five N -methylated amino acid residues, and has potent antifungal activity. Recently, the genome-sequencing analysis of C. clavata was completed, and the biosynthetic genes involved in KK-1 production were predicted by using a novel gene cluster mining tool, MIDDAS-M. These genes form an approximately 75-kb cluster, which includes nine open reading frames, containing a non-ribosomal peptide synthetase (NRPS) gene. To determine whether the predicted genes were responsible for the biosynthesis of KK-1, we performed heterologous production of KK-1 in Aspergillus oryzae by introduction of the cluster genes into the genome of A. oryzae . The NRPS gene was split in two fragments and then reconstructed in the A. oryzae genome, because the gene was quite large (approximately 40 kb). The remaining seven genes in the cluster, excluding the regulatory gene kkR , were simultaneously introduced into the strain of A. oryzae in which NRPS had already been incorporated. To evaluate the heterologous production of KK-1 in A. oryzae , gene expression was analyzed by RT-PCR and KK-1 productivity was quantified by HPLC. KK-1 was produced in variable quantities by a number of transformed strains, along with expression of the cluster genes. The amount of KK-1 produced by the strain with the greatest expression of all genes was lower than that produced by the original producer, C. clavata . Therefore, expression of the cluster genes is necessary and sufficient for the heterologous production of KK-1 in A. oryzae , although there may be unknown factors limiting productivity in this species.

  9. Heterologous Production of a Novel Cyclic Peptide Compound, KK-1, in Aspergillus oryzae

    PubMed Central

    Yoshimi, Akira; Yamaguchi, Sigenari; Fujioka, Tomonori; Kawai, Kiyoshi; Gomi, Katsuya; Machida, Masayuki; Abe, Keietsu

    2018-01-01

    A novel cyclic peptide compound, KK-1, was originally isolated from the plant-pathogenic fungus Curvularia clavata. It consists of 10 amino acid residues, including five N-methylated amino acid residues, and has potent antifungal activity. Recently, the genome-sequencing analysis of C. clavata was completed, and the biosynthetic genes involved in KK-1 production were predicted by using a novel gene cluster mining tool, MIDDAS-M. These genes form an approximately 75-kb cluster, which includes nine open reading frames, containing a non-ribosomal peptide synthetase (NRPS) gene. To determine whether the predicted genes were responsible for the biosynthesis of KK-1, we performed heterologous production of KK-1 in Aspergillus oryzae by introduction of the cluster genes into the genome of A. oryzae. The NRPS gene was split in two fragments and then reconstructed in the A. oryzae genome, because the gene was quite large (approximately 40 kb). The remaining seven genes in the cluster, excluding the regulatory gene kkR, were simultaneously introduced into the strain of A. oryzae in which NRPS had already been incorporated. To evaluate the heterologous production of KK-1 in A. oryzae, gene expression was analyzed by RT-PCR and KK-1 productivity was quantified by HPLC. KK-1 was produced in variable quantities by a number of transformed strains, along with expression of the cluster genes. The amount of KK-1 produced by the strain with the greatest expression of all genes was lower than that produced by the original producer, C. clavata. Therefore, expression of the cluster genes is necessary and sufficient for the heterologous production of KK-1 in A. oryzae, although there may be unknown factors limiting productivity in this species. PMID:29686660

  10. CrosstalkNet: A Visualization Tool for Differential Co-expression Networks and Communities.

    PubMed

    Manem, Venkata; Adam, George Alexandru; Gruosso, Tina; Gigoux, Mathieu; Bertos, Nicholas; Park, Morag; Haibe-Kains, Benjamin

    2018-04-15

    Variations in physiological conditions can rewire molecular interactions between biological compartments, which can yield novel insights into gain or loss of interactions specific to perturbations of interest. Networks are a promising tool to elucidate intercellular interactions, yet exploration of these large-scale networks remains a challenge due to their high dimensionality. To retrieve and mine interactions, we developed CrosstalkNet, a user friendly, web-based network visualization tool that provides a statistical framework to infer condition-specific interactions coupled with a community detection algorithm for bipartite graphs to identify significantly dense subnetworks. As a case study, we used CrosstalkNet to mine a set of 54 and 22 gene-expression profiles from breast tumor and normal samples, respectively, with epithelial and stromal compartments extracted via laser microdissection. We show how CrosstalkNet can be used to explore large-scale co-expression networks and to obtain insights into the biological processes that govern cross-talk between different tumor compartments. Significance: This web application enables researchers to mine complex networks and to decipher novel biological processes in tumor epithelial-stroma cross-talk as well as in other studies of intercompartmental interactions. Cancer Res; 78(8); 2140-3. ©2018 AACR . ©2018 American Association for Cancer Research.

  11. Use of Walnut Shell Powder to Inhibit Expression of Fe2+-Oxidizing Genes of Acidithiobacillus Ferrooxidans

    PubMed Central

    Li, Yuhui; Liu, Yehao; Tan, Huifang; Zhang, Yifeng; Yue, Mei

    2016-01-01

    Acidithiobacillus ferrooxidans is a Gram-negative bacterium that obtains energy by oxidizing Fe2+ or reduced sulfur compounds. This bacterium contributes to the formation of acid mine drainage (AMD). This study determined whether walnut shell powder inhibits the growth of A. ferrooxidans. First, the effects of walnut shell powder on Fe2+ oxidization and H+ production were evaluated. Second, the chemical constituents of walnut shell were isolated to determine the active ingredient(s). Third, the expression of Fe2+-oxidizing genes and rus operon genes was investigated using real-time polymerase chain reaction. Finally, growth curves were plotted, and a bioleaching experiment was performed to confirm the active ingredient(s) in walnut shells. The results indicated that both walnut shell powder and the phenolic fraction exert high inhibitory effects on Fe2+ oxidation and H+ production by A. ferrooxidans cultured in standard 9K medium. The phenolic components exert their inhibitory effects by down-regulating the expression of Fe2+-oxidizing genes and rus operon genes, which significantly decreased the growth of A. ferrooxidans. This study revealed walnut shell powder to be a promising substance for controlling AMD. PMID:27144574

  12. Arabidopsis Gene Family Profiler (aGFP)--user-oriented transcriptomic database with easy-to-use graphic interface.

    PubMed

    Dupl'áková, Nikoleta; Renák, David; Hovanec, Patrik; Honysová, Barbora; Twell, David; Honys, David

    2007-07-23

    Microarray technologies now belong to the standard functional genomics toolbox and have undergone massive development leading to increased genome coverage, accuracy and reliability. The number of experiments exploiting microarray technology has markedly increased in recent years. In parallel with the rapid accumulation of transcriptomic data, on-line analysis tools are being introduced to simplify their use. Global statistical data analysis methods contribute to the development of overall concepts about gene expression patterns and to query and compose working hypotheses. More recently, these applications are being supplemented with more specialized products offering visualization and specific data mining tools. We present a curated gene family-oriented gene expression database, Arabidopsis Gene Family Profiler (aGFP; http://agfp.ueb.cas.cz), which gives the user access to a large collection of normalised Affymetrix ATH1 microarray datasets. The database currently contains NASC Array and AtGenExpress transcriptomic datasets for various tissues at different developmental stages of wild type plants gathered from nearly 350 gene chips. The Arabidopsis GFP database has been designed as an easy-to-use tool for users needing an easily accessible resource for expression data of single genes, pre-defined gene families or custom gene sets, with the further possibility of keyword search. Arabidopsis Gene Family Profiler presents a user-friendly web interface using both graphic and text output. Data are stored at the MySQL server and individual queries are created in PHP script. The most distinguishable features of Arabidopsis Gene Family Profiler database are: 1) the presentation of normalized datasets (Affymetrix MAS algorithm and calculation of model-based gene-expression values based on the Perfect Match-only model); 2) the choice between two different normalization algorithms (Affymetrix MAS4 or MAS5 algorithms); 3) an intuitive interface; 4) an interactive "virtual plant" visualizing the spatial and developmental expression profiles of both gene families and individual genes. Arabidopsis GFP gives users the possibility to analyze current Arabidopsis developmental transcriptomic data starting with simple global queries that can be expanded and further refined to visualize comparative and highly selective gene expression profiles.

  13. Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis.

    PubMed

    González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen

    2016-09-15

    Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.

  14. Novel insights into freshwater hydrocarbon-rich sediments using metatranscriptomics: Opening the black box.

    PubMed

    Reid, Thomas; Chaganti, Subba Rao; Droppo, Ian G; Weisener, Christopher G

    2018-06-01

    Baseline biogeochemical surveys of natural environments is an often overlooked field of environmental studies. Too often research begins once contamination has occurred, with a knowledge gap as to how the affected area behaved prior to outside (often anthropogenic) influences. These baseline characterizations can provide insight into proposed bioremediation strategies crucial in cleaning up chemical spill sites or heavily mined regions. Hence, this study was conducted to survey the in-situ microbial activity within freshwater hydrocarbon-rich environments cutting through the McMurray formation - the geologic strata constituting the oil sands. We are the first to report in-situ functional variations among these freshwater microbial ecosystems using metatranscriptomics, providing insight into the in-situ gene expression within these naturally hydrocarbon-rich sites. Key genes involved in energy metabolism (nitrogen, sulfur and methane) and hydrocarbon degradation, including transcripts relating to the observed expression of methane oxidation are reported. This information provides better linkages between hydrocarbon impacted environments, closing knowledge gaps for optimizing not only oil sands mine reclamation but also enhancing microbial reclamation strategies in various freshwater environments. These finding can also be applied to existing contaminated environments, in need of efficient reclamation efforts. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Mining genes involved in insecticide resistance of Liposcelis bostrychophila Badonnel by transcriptome and expression profile analysis.

    PubMed

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids.

  16. Mining Genes Involved in Insecticide Resistance of Liposcelis bostrychophila Badonnel by Transcriptome and Expression Profile Analysis

    PubMed Central

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Background Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. Methodology and Principal Findings In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. Conclusion The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids. PMID:24278202

  17. Literature-based condition-specific miRNA-mRNA target prediction.

    PubMed

    Oh, Minsik; Rhee, Sungmin; Moon, Ji Hwan; Chae, Heejoon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2017-01-01

    miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.

  18. MetaRanker 2.0: a web server for prioritization of genetic variation data

    PubMed Central

    Pers, Tune H.; Dworzyński, Piotr; Thomas, Cecilia Engel; Lage, Kasper; Brunak, Søren

    2013-01-01

    MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein–protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0. PMID:23703204

  19. MetaRanker 2.0: a web server for prioritization of genetic variation data.

    PubMed

    Pers, Tune H; Dworzyński, Piotr; Thomas, Cecilia Engel; Lage, Kasper; Brunak, Søren

    2013-07-01

    MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein-protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0.

  20. The Arabidopsis translatome cell-specific mRNA atlas: Mining suberin and cutin lipid monomer biosynthesis genes as an example for data application.

    PubMed

    Mustroph, Angelika; Bailey-Serres, Julia

    2010-03-01

    Plants consist of distinct cell types distinguished by position, morphological features and metabolic activities. We recently developed a method to extract cell-type specific mRNA populations by immunopurification of ribosome-associated mRNAs. Microarray profiles of 21 cell-specific mRNA populations from seedling roots and shoots comprise the Arabidopsis Translatome dataset. This gene expression atlas provides a new tool for the study of cell-specific processes. Here we provide an example of how genes involved in a pathway limited to one or few cell-types can be further characterized and new candidate genes can be predicted. Cells of the root endodermis produce suberin as an inner barrier between the cortex and stele, whereas the shoot epidermal cells form cutin as a barrier to the external environment. Both polymers consist of fatty acid derivates, and share biosynthetic origins. We use the Arabidopsis Translatome dataset to demonstrate the significant cell-specific expression patterns of genes involved in those biosynthetic processes and suggest new candidate genes in the biosynthesis of suberin and cutin.

  1. Discovering Protein-Coding Genes from the Environment: Time for the Eukaryotes?

    PubMed

    Marmeisse, Roland; Kellner, Harald; Fraissinet-Tachet, Laurence; Luis, Patricia

    2017-09-01

    Eukaryotic microorganisms from diverse environments encompass a large number of taxa, many of them still unknown to science. One strategy to mine these organisms for genes of biotechnological relevance is to use a pool of eukaryotic mRNA directly extracted from environmental samples. Recent reports demonstrate that the resulting metatranscriptomic cDNA libraries can be screened by expression in yeast for a wide range of genes and functions from many of the different eukaryotic taxa. In combination with novel emerging high-throughput technologies, we anticipate that this approach should contribute to exploring the functional diversity of the eukaryotic microbiota. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Mining a differential sialotranscriptome of Rhipicephalus microplus guides antigen discovery to formulate a vaccine that reduces tick infestations.

    PubMed

    Maruyama, Sandra R; Garcia, Gustavo R; Teixeira, Felipe R; Brandão, Lucinda G; Anderson, Jennifer M; Ribeiro, José M C; Valenzuela, Jesus G; Horackova, Jana; Veríssimo, Cecília J; Katiki, Luciana M; Banin, Tamy M; Zangirolamo, Amanda F; Gardinassi, Luiz G; Ferreira, Beatriz R; de Miranda-Santos, Isabel K F

    2017-04-26

    Ticks cause massive damage to livestock and vaccines are one sustainable substitute for the acaricides currently heavily used to control infestations. To guide antigen discovery for a vaccine that targets the gamut of parasitic strategies mediated by tick saliva and enables immunological memory, we exploited a transcriptome constructed from salivary glands from all stages of Rhipicephalus microplus ticks feeding on genetically tick-resistant and susceptible bovines. Different levels of host anti-tick immunity affected gene expression in tick salivary glands; we thus selected four proteins encoded by genes weakly expressed in ticks attempting to feed on resistant hosts or otherwise abundantly expressed in ticks fed on susceptible hosts; these sialoproteins mediate four functions of parasitism deployed by male ticks and that do not induce antibodies in naturally infected, susceptible bovines. We then evaluated in tick-susceptible heifers an alum-adjuvanted vaccine formulated with recombinant proteins. Parasite performance (i.e. weight and numbers of females finishing their parasitic cycle) and titres of antigen-specific antibodies were significantly reduced or increased, respectively, in vaccinated versus control heifers, conferring an efficacy of 73.2%; two of the antigens were strong immunogens, rich in predicted T-cell epitopes and challenge infestations boosted antibody responses against them. Mining sialotranscriptomes guided by the immunity of tick-resistant hosts selected important targets and infestations boosted immune memory against salivary antigens.

  3. Using ZFIN: Data Types, Organization, and Retrieval.

    PubMed

    Van Slyke, Ceri E; Bradford, Yvonne M; Howe, Douglas G; Fashena, David S; Ramachandran, Sridhar; Ruzicka, Leyla

    2018-01-01

    The Zebrafish Model Organism Database (ZFIN; zfin.org) was established in 1994 as the primary genetic and genomic resource for the zebrafish research community. Some of the earliest records in ZFIN were for people and laboratories. Since that time, services and data types provided by ZFIN have grown considerably. Today, ZFIN provides the official nomenclature for zebrafish genes, mutants, and transgenics and curates many data types including gene expression, phenotypes, Gene Ontology, models of human disease, orthology, knockdown reagents, transgenic constructs, and antibodies. Ontologies are used throughout ZFIN to structure these expertly curated data. An integrated genome browser provides genomic context for genes, transgenics, mutants, and knockdown reagents. ZFIN also supports a community wiki where the research community can post new antibody records and research protocols. Data in ZFIN are accessible via web pages, download files, and the ZebrafishMine (zebrafishmine.org), an installation of the InterMine data warehousing software. Searching for data at ZFIN utilizes both parameterized search forms and a single box search for searching or browsing data quickly. This chapter aims to describe the primary ZFIN data and services, and provide insight into how to use and interpret ZFIN searches, data, and web pages.

  4. [Weighted gene co-expression network analysis in biomedicine research].

    PubMed

    Liu, Wei; Li, Li; Ye, Hua; Tu, Wei

    2017-11-25

    High-throughput biological technologies are now widely applied in biology and medicine, allowing scientists to monitor thousands of parameters simultaneously in a specific sample. However, it is still an enormous challenge to mine useful information from high-throughput data. The emergence of network biology provides deeper insights into complex bio-system and reveals the modularity in tissue/cellular networks. Correlation networks are increasingly used in bioinformatics applications. Weighted gene co-expression network analysis (WGCNA) tool can detect clusters of highly correlated genes. Therefore, we systematically reviewed the application of WGCNA in the study of disease diagnosis, pathogenesis and other related fields. First, we introduced principle, workflow, advantages and disadvantages of WGCNA. Second, we presented the application of WGCNA in disease, physiology, drug, evolution and genome annotation. Then, we indicated the application of WGCNA in newly developed high-throughput methods. We hope this review will help to promote the application of WGCNA in biomedicine research.

  5. Microarray‑based screening of differentially expressed genes in glucocorticoid‑induced avascular necrosis.

    PubMed

    Huang, Gangyong; Wei, Yibing; Zhao, Guanglei; Xia, Jun; Wang, Siqun; Wu, Jianguo; Chen, Feiyan; Chen, Jie; Shi, Jingshen

    2017-06-01

    The underlying mechanisms of glucocorticoid (GC)‑induced avascular necrosis of the femoral head (ANFH) have yet to be fully understood, in particular the mechanisms associated with the change of gene expression pattern. The present study aimed to identify key genes with a differential expression pattern in GC‑induced ANFH. E‑MEXP‑2751 microarray data were downloaded from the ArrayExpress database. Differentially expressed genes (DEGs) were identified in 5 femoral head samples of steroid‑induced ANFH rats compared with 5 placebo‑treated rat samples. Gene Ontology (GO) and pathway enrichment analyses were performed upon these DEGs. A total 93 DEGs (46 upregulated and 47 downregulated genes) were identified in GC‑induced ANFH samples. These DEGs were enriched in different GO terms and pathways, including chondrocyte differentiation and detection of chemical stimuli. The enrichment map revealed that skeletal system development was interconnected with several other GO terms by gene overlap. The literature mined network analysis revealed that 5 upregulated genes were associated with femoral necrosis, including parathyroid hormone receptor 1 (PTHR1), vitamin D (1,25‑Dihydroxyvitamin D3) receptor (VDR), collagen, type II, α1, proprotein convertase subtilisin/kexin type 6 and zinc finger protein 354C (ZFP354C). In addition, ZFP354C and VDR were identified to transcription factors. Furthermore, PTHR1 was revealed to interact with VDR, and α‑2‑macroglobulin (A2M) interacted with fibronectin 1 (FN1) in the PPI network. PTHR1 may be involved in GC‑induced ANFH via interacting with VDR. A2M may also be involved in the development of GC‑induced ANFH through interacting with FN1. An improved understanding of the molecular mechanisms underlying GC‑induced ANFH may provide novel targets for diagnostics and therapeutic treatment.

  6. Microarray-based screening of differentially expressed genes in glucocorticoid-induced avascular necrosis

    PubMed Central

    Huang, Gangyong; Wei, Yibing; Zhao, Guanglei; Xia, Jun; Wang, Siqun; Wu, Jianguo; Chen, Feiyan; Chen, Jie; Shi, Jingshen

    2017-01-01

    The underlying mechanisms of glucocorticoid (GC)-induced avascular necrosis of the femoral head (ANFH) have yet to be fully understood, in particular the mechanisms associated with the change of gene expression pattern. The present study aimed to identify key genes with a differential expression pattern in GC-induced ANFH. E-MEXP-2751 microarray data were downloaded from the ArrayExpress database. Differentially expressed genes (DEGs) were identified in 5 femoral head samples of steroid-induced ANFH rats compared with 5 placebo-treated rat samples. Gene Ontology (GO) and pathway enrichment analyses were performed upon these DEGs. A total 93 DEGs (46 upregulated and 47 downregulated genes) were identified in GC-induced ANFH samples. These DEGs were enriched in different GO terms and pathways, including chondrocyte differentiation and detection of chemical stimuli. The enrichment map revealed that skeletal system development was interconnected with several other GO terms by gene overlap. The literature mined network analysis revealed that 5 upregulated genes were associated with femoral necrosis, including parathyroid hormone receptor 1 (PTHR1), vitamin D (1,25-Dihydroxyvitamin D3) receptor (VDR), collagen, type II, α1, proprotein convertase subtilisin/kexin type 6 and zinc finger protein 354C (ZFP354C). In addition, ZFP354C and VDR were identified to transcription factors. Furthermore, PTHR1 was revealed to interact with VDR, and α-2-macroglobulin (A2M) interacted with fibronectin 1 (FN1) in the PPI network. PTHR1 may be involved in GC-induced ANFH via interacting with VDR. A2M may also be involved in the development of GC-induced ANFH through interacting with FN1. An improved understanding of the molecular mechanisms underlying GC-induced ANFH may provide novel targets for diagnostics and therapeutic treatment. PMID:28393228

  7. Discovery and Characterization of a Group of Fungal Polycyclic Polyketide Prenyltransferases

    PubMed Central

    Chooi, Yit-Heng; Wang, Peng; Fang, Jinxu; Li, Yanran; Wu, Katherine; Wang, Pin; Tang, Yi

    2014-01-01

    The prenyltransferase (PTase) gene vrtC was proposed to be involved in viridicatumtoxin (1) biosynthesis in Penicillium aethiopicum. Targeted gene deletion and reconstitution of recombinant VrtC activity in vitro established that VrtC is a geranyl transferase that catalyzes a regiospecific Friedel-Crafts alkylation of the naphthacenedione carboxamide intermediate 2 at carbon 6 with geranyl diphosphate (GPP). VrtC can function in the absence of divalent ions and can utilize similar naphthacenedione substrates, such as the acetyl-primed TAN-1612 (4). Genome mining using the VrtC protein sequence leads to the identification of a homologous group of PTase genes in the genomes of human and animal-associated fungi. Three enzymes encoded by this new subgroup of PTase genes from Neosartorya fischeri, Microsporum canis and Trichophyton tonsurans were shown to be able to catalyze transfer of dimethylallyl to several tetracyclic naphthacenedione substrates in vitro. In total, seven C5- or C10-prenylated naphthacenedione compounds were generated. The regioselectivity of these new polycyclic PTases (pcPTases) was confirmed by characterization of product 9 obtained from biotransformation of 4 in Escherichia coli expressing the N. fischeri pcPTase gene. The discovery of this new subgroup of PTases extends our enzymatic tools for modifying polycyclic compounds and enables genome mining of new prenylated polyketides. PMID:22590971

  8. NCBI GEO: mining tens of millions of expression profiles--database and tools update.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Edgar, Ron

    2007-01-01

    The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at http://www.ncbi.nlm.nih.gov/geo/

  9. Health effects of uranium: new research findings.

    PubMed

    Brugge, Doug; Buchner, Virginia

    2011-01-01

    Recent plans for a nuclear renaissance in both established and emerging economies have prompted increased interest in uranium mining. With the potential for more uranium mining worldwide and a growth in the literature on the toxicology and epidemiology of uranium and uranium mining, we found it timely to review the current state of knowledge. Here, we present a review of the health effects of uranium mining, with an emphasis on newer findings (2005-2011). Uranium mining can contaminate air, water, and soil. The chemical toxicity of the metal constitutes the primary environmental health hazard, with the radioactivity of uranium a secondary concern. The update of the toxicologic evidence on uranium adds to the established findings regarding nephrotoxicity, genotoxicity, and developmental defects. Additional novel toxicologic findings, including some at the molecular level, are now emerging that raise the biological plausibility of adverse effects on the brain, on reproduction, including estrogenic effects, on gene expression, and on uranium metabolism. Historically, most epidemiology on uranium mining has focused on mine workers and radon exposure. Although that situation is still overwhelmingly true, a smaller emerging literature has begun to form around environmental exposure in residential areas near uranium mining and processing facilities. We present and critique such studies. Clearly, more epidemiologic research is needed to contribute to causal inference. As much damage is irreversible, and possibly cumulative, present efforts must be vigorous to limit environmental uranium contamination and exposure.

  10. Evidence for host genetic regulation of altered lipid metabolism in experimental toxoplasmosis supported with gene data mining results

    PubMed Central

    2017-01-01

    Toxoplasma gondii is one of the most successful parasites on Earth, infecting a wide array of mammals including one third of the global human population. The obligate intracellular protozoon is not capable of synthesizing cholesterol (Chl), and thus depends on uptake of host Chl for its own development. To explore the genetic regulation of previously observed lipid metabolism alterations during acute murine T. gondii infection, we here assessed total Chl and its fractions in serum and selected tissues at the pathophysiological and molecular level, and integrated the observed gene expression of selected molecules relevant for Chl metabolism, including its biosynthetic and export KEGG pathways, with the results of published transcriptomes obtained in similar murine models of T. gondii infection. The serum lipid status as well as the transcript levels of relevant genes in the brain and the liver were assessed in experimental models of acute and chronic toxoplasmosis in wild-type mice. The results showed that acute infection was associated with a decrease in Chl content in both the liver and periphery (brain, peripheral lymphocytes), and a decrease in Chl reverse transport. In contrast, in chronic infection, a return to normal levels of Chl metabolism has been noted. These changes corresponded to the brain and liver gene expression results as well as to data obtained via mining. We propose that the observed changes in Chl metabolism are part of the host defense response. Further insight into the lipid metabolism in T. gondii infection may provide novel targets for therapeutic agents. PMID:28459857

  11. Gene Expression Profiling of Benign and Malignant Pheochromocytoma

    PubMed Central

    BROUWERS, FREDERIEKE M.; ELKAHLOUN, ABDEL G.; MUNSON, PETER J.; EISENHOFER, GRAEME; BARB, JENNIFER; LINEHAN, W. MARSTON; LENDERS, JACQUES W.M.; DE KRIJGER, RONALD; MANNELLI, MASSIMO; UDELSMAN, ROBERT; OCAL, IDRIS T.; SHULKIN, BARRY L.; BORNSTEIN, STEFAN R.; BREZA, JAN; KSINANTOVA, LUCIA; PACAK, KAREL

    2016-01-01

    There are currently no reliable diagnostic and prognostic markers or effective treatments for malignant pheochromocytoma. This study used oligonucleotide microarrays to examine gene expression profiles in pheochromocytomas from 90 patients, including 20 with malignant tumors, the latter including metastases and primary tumors from which metastases developed. Other subgroups of tumors included those defined by tissue norepinephrine compared to epinephrine contents (i.e., noradrenergic versus adrenergic phenotypes), adrenal versus extra-adrenal locations, and presence of germline mutations of genes pre-disposing to the tumor. Correcting for the confounding influence of nora-drenergic versus adrenergic catecholamine phenotype by the analysis of variance revealed a larger and more accurate number of genes that discriminated benign from malignant pheochromocytomas than when the confounding influence of catecholamine phenotype was not considered. Seventy percent of these genes were underexpressed in malignant compared to benign tumors. Similarly, 89% of genes were underexpressed in malignant primary tumors compared to benign tumors, suggesting that malignant potential is largely characterized by a less-differentiated pattern of gene expression. The present database of differentially expressed genes provides a unique resource for mapping the pathways leading to malignancy and for establishing new targets for treatment and diagnostic and prognostic markers of malignant disease. The database may also be useful for examining mechanisms of tumorigenesis and genotype–phenotype relationships. Further progress on the basis of this database can be made from follow-up confirmatory studies, application of bioinformatics approaches for data mining and pathway analyses, testing in pheochromocytoma cell culture and animal model systems, and retrospective and prospective studies of diagnostic markers. PMID:17102123

  12. Novel candidate genes of the PARK7 interactome as mediators of apoptosis and acetylation in multiple sclerosis: An in silico analysis.

    PubMed

    Vavougios, George D; Zarogiannis, Sotirios G; Krogfelt, Karen Angeliki; Gourgoulianis, Konstantinos; Mitsikostas, Dimos Dimitrios; Hadjigeorgiou, Georgios

    2018-01-01

    currently only 4 studies have explored the potential role of PARK7's dysregulation in MS pathophysiology Currently, no study has evaluated the potential role of the PARK7 interactome in MS. The aim of our study was to assess the differential expression of PARK7 mRNA in peripheral blood mononuclears (PBMCs) donated from MS versus healthy patients using data mining techniques. The PARK7 interactome data from the GDS3920 profile were scrutinized for differentially expressed genes (DEGs); Gene Enrichment Analysis (GEA) was used to detect significantly enriched biological functions. 27 differentially expressed genes in the MS dataset were detected; 12 of these (NDUFA4, UBA2, TDP2, NPM1, NDUFS3, SUMO1, PIAS2, KIAA0101, RBBP4, NONO, RBBP7 AND HSPA4) are reported for the first time in MS. Stepwise Linear Discriminant Function Analysis constructed a predictive model (Wilk's λ = 0.176, χ 2 = 45.204, p = 1.5275e -10 ) with 2 variables (TIDP2, RBBP4) that achieved 96.6% accuracy when discriminating between patients and controls. Gene Enrichment Analysis revealed that induction and regulation of programmed / intrinsic cell death represented the most salient Gene Ontology annotations. Cross-validation on systemic lupus erythematosus and ischemic stroke datasets revealed that these functions are unique to the MS dataset. Based on our results, novel potential target genes are revealed; these differentially expressed genes regulate epigenetic and apoptotic pathways that may further elucidate underlying mechanisms of autorreactivity in MS. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. A comparative analysis of biclustering algorithms for gene expression data

    PubMed Central

    Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.

    2013-01-01

    The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters. PMID:22772837

  14. Short-term transcriptional response of microbial communities to N-fertilization in pine forest soil

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Albright, Michaeline Burr Nelson; Johansen, Renee; Lopez, Deanna

    Numerous studies have examined the long-term effect of experimental nitrogen (N) deposition in terrestrial ecosystems, however N-specific mechanistic markers are difficult to disentangle from responses to other environmental changes. The strongest picture of N-responsive mechanistic markers is likely to arise from measurements over a short (hours to days) timescale immediately after inorganic N deposition. Therefore, we assessed the short-term (3-day) transcriptional response of microbial communities in two soil strata from a pine forest to a high dose of N fertilization (c.a. 1mg/g of soil material) in laboratory microcosms. Here, we hypothesized that N fertilization would repress the expression of fungalmore » and bacterial genes linked to N-mining from plant litter. However, despite N-suppression of microbial respiration, the most pronounced differences in functional gene expression were between strata rather than in response to the N addition. Overall, ~4% of metabolic genes changed in expression with N addition, while three times as many (~12%) were significantly different across the different soil strata in the microcosms. In particular, we found little evidence of N changing expression levels of metabolic genes associated with complex carbohydrate degradation (CAZymes) or inorganic N utilization. This suggests that direct N repression of microbial functional gene expression is not the principle mechanism for reduced soil respiration immediately after N deposition. Instead, changes in expression with N addition occurred primarily in general cell maintenance areas, for example in ribosome-related transcripts. Transcriptional changes in functional gene abundance in response to N-addition observed in longer-term field studies likely results from changes in microbial composition.« less

  15. Short-term transcriptional response of microbial communities to N-fertilization in pine forest soil

    DOE PAGES

    Albright, Michaeline Burr Nelson; Johansen, Renee; Lopez, Deanna; ...

    2018-05-25

    Numerous studies have examined the long-term effect of experimental nitrogen (N) deposition in terrestrial ecosystems, however N-specific mechanistic markers are difficult to disentangle from responses to other environmental changes. The strongest picture of N-responsive mechanistic markers is likely to arise from measurements over a short (hours to days) timescale immediately after inorganic N deposition. Therefore, we assessed the short-term (3-day) transcriptional response of microbial communities in two soil strata from a pine forest to a high dose of N fertilization (c.a. 1mg/g of soil material) in laboratory microcosms. Here, we hypothesized that N fertilization would repress the expression of fungalmore » and bacterial genes linked to N-mining from plant litter. However, despite N-suppression of microbial respiration, the most pronounced differences in functional gene expression were between strata rather than in response to the N addition. Overall, ~4% of metabolic genes changed in expression with N addition, while three times as many (~12%) were significantly different across the different soil strata in the microcosms. In particular, we found little evidence of N changing expression levels of metabolic genes associated with complex carbohydrate degradation (CAZymes) or inorganic N utilization. This suggests that direct N repression of microbial functional gene expression is not the principle mechanism for reduced soil respiration immediately after N deposition. Instead, changes in expression with N addition occurred primarily in general cell maintenance areas, for example in ribosome-related transcripts. Transcriptional changes in functional gene abundance in response to N-addition observed in longer-term field studies likely results from changes in microbial composition.« less

  16. Mining meiosis and gametogenesis with DNA microarrays.

    PubMed

    Schlecht, Ulrich; Primig, Michael

    2003-04-01

    Gametogenesis is a key developmental process that involves complex transcriptional regulation of numerous genes including many that are conserved between unicellular eukaryotes and mammals. Recent expression-profiling experiments using microarrays have provided insight into the co-ordinated transcription of several hundred genes during mitotic growth and meiotic development in budding and fission yeast. Furthermore, microarray-based studies have identified numerous loci that are regulated during the cell cycle or expressed in a germ-cell specific manner in eukaryotic model systems like Caenorhabditis elegans, Mus musculus as well as Homo sapiens. The unprecedented amount of information produced by post-genome biology has spawned novel approaches to organizing biological knowledge using currently available information technology. This review outlines experiments that contribute to an emerging comprehensive picture of the molecular machinery governing sexual reproduction in eukaryotes.

  17. Transcriptome Profiling of Khat (Catha edulis) and Ephedra sinica Reveals Gene Candidates Potentially Involved in Amphetamine-Type Alkaloid Biosynthesis

    PubMed Central

    Groves, Ryan A.; Hagel, Jillian M.; Zhang, Ye; Kilpatrick, Korey; Levy, Asaf; Marsolais, Frédéric; Lewinsohn, Efraim; Sensen, Christoph W.; Facchini, Peter J.

    2015-01-01

    Amphetamine analogues are produced by plants in the genus Ephedra and by khat (Catha edulis), and include the widely used decongestants and appetite suppressants (1S,2S)-pseudoephedrine and (1R,2S)-ephedrine. The production of these metabolites, which derive from L-phenylalanine, involves a multi-step pathway partially mapped out at the biochemical level using knowledge of benzoic acid metabolism established in other plants, and direct evidence using khat and Ephedra species as model systems. Despite the commercial importance of amphetamine-type alkaloids, only a single step in their biosynthesis has been elucidated at the molecular level. We have employed Illumina next-generation sequencing technology, paired with Trinity and Velvet-Oases assembly platforms, to establish data-mining frameworks for Ephedra sinica and khat plants. Sequence libraries representing a combined 200,000 unigenes were subjected to an annotation pipeline involving direct searches against public databases. Annotations included the assignment of Gene Ontology (GO) terms used to allocate unigenes to functional categories. As part of our functional genomics program aimed at novel gene discovery, the databases were mined for enzyme candidates putatively involved in alkaloid biosynthesis. Queries used for mining included enzymes with established roles in benzoic acid metabolism, as well as enzymes catalyzing reactions similar to those predicted for amphetamine alkaloid metabolism. Gene candidates were evaluated based on phylogenetic relationships, FPKM-based expression data, and mechanistic considerations. Establishment of expansive sequence resources is a critical step toward pathway characterization, a goal with both academic and industrial implications. PMID:25806807

  18. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

    PubMed

    Feltus, F Alex; Ficklin, Stephen P; Gibson, Scott M; Smith, Melissa C

    2013-06-05

    In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.

  19. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

    PubMed Central

    2013-01-01

    Background In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. Results A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Conclusions Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired. PMID:23738693

  20. Mouse genotypes drive the liver and adrenal gland clocks

    NASA Astrophysics Data System (ADS)

    Košir, Rok; Prosenc Zmrzljak, Uršula; Korenčič, Anja; Juvan, Peter; Ačimovič, Jure; Rozman, Damjana

    2016-08-01

    Circadian rhythms regulate a plethora of physiological processes. Perturbations of the rhythm can result in pathologies which are frequently studied in inbred mouse strains. We show that the genotype of mouse lines defines the circadian gene expression patterns. Expression of majority of core clock and output metabolic genes are phase delayed in the C56BL/6J line compared to 129S2 in the adrenal glands and the liver. Circadian amplitudes are generally higher in the 129S2 line. Experiments in dark - dark (DD) and light - dark conditions (LD), exome sequencing and data mining proposed that mouse lines differ in single nucleotide variants in the binding regions of clock related transcription factors in open chromatin regions. A possible mechanisms of differential circadian expression could be the entrainment and transmission of the light signal to peripheral organs. This is supported by the genotype effect in adrenal glands that is largest under LD, and by the high number of single nucleotide variants in the Receptor, Kinase and G-protein coupled receptor Panther molecular function categories. Different phenotypes of the two mouse lines and changed amino acid sequence of the Period 2 protein possibly contribute further to the observed differences in circadian gene expression.

  1. Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

    PubMed Central

    2011-01-01

    Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981

  2. Open reading frames associated with cancer in the dark matter of the human genome.

    PubMed

    Delgado, Ana Paula; Brandao, Pamela; Chapado, Maria Julia; Hamid, Sheilin; Narayanan, Ramaswamy

    2014-01-01

    The uncharacterized proteins (open reading frames, ORFs) in the human genome offer an opportunity to discover novel targets for cancer. A systematic analysis of the dark matter of the human proteome for druggability and biomarker discovery is crucial to mining the genome. Numerous data mining tools are available to mine these ORFs to develop a comprehensive knowledge base for future target discovery and validation. Using the Genetic Association Database, the ORFs of the human dark matter proteome were screened for evidence of association with neoplasms. The Phenome-Genome Integrator tool was used to establish phenotypic association with disease traits including cancer. Batch analysis of the tools for protein expression analysis, gene ontology and motifs and domains was used to characterize the ORFs. Sixty-two ORFs were identified for neoplasm association. The expression Quantitative Trait Loci (eQTL) analysis identified thirteen ORFs related to cancer traits. Protein expression, motifs and domain analysis and genome-wide association studies verified the relevance of these OncoORFs in diverse tumors. The OncoORFs are also associated with a wide variety of human diseases and disorders. Our results link the OncoORFs to diverse diseases and disorders. This suggests a complex landscape of the uncharacterized proteome in human diseases. These results open the dark matter of the proteome to novel cancer target research. Copyright© 2014, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  3. Association of a cytarabine chemosensitivity related gene expression signature with survival in cytogenetically normal acute myeloid leukemia.

    PubMed

    Yan, Han; Wen, Lu; Tan, Dan; Xie, Pan; Pang, Feng-Mei; Zhou, Hong-Hao; Zhang, Wei; Liu, Zhao-Qian; Tang, Jie; Li, Xi; Chen, Xiao-Ping

    2017-01-03

    The prognosis of cytogenetically normal acute myeloid leukemia (CN-AML) varies greatly among patients. Achievement of complete remission (CR) after chemotherapy is indispensable for a better prognosis. To develop a gene signature predicting overall survival (OS) in CN-AML, we performed data mining procedure based on whole genome expression data of both blood cancer cell lines and AML patients from open access database. A gene expression signature including 42 probes was derived. These probes were significantly associated with both cytarabine half maximal inhibitory concentration values in blood cancer cell lines and OS in CN-AML patients. By using cox regression analysis and linear regression analysis, a chemo-sensitive score calculated algorithm based on mRNA expression levels of the 42 probes was established. The scores were associated with OS in both the training sample (p=5.13 × 10-4, HR=2.040, 95% CI: 1.364-3.051) and the validation sample (p=0.002, HR=2.528, 95% CI: 1.393-4.591) of the GSE12417 dataset from Gene Expression Omnibus. In The Cancer Genome Atlas (TCGA) CN-AML patients, higher scores were found to be associated with both worse OS (p=0.013, HR=2.442, 95% CI: 1.205-4.950) and DFS (p=0.015, HR=2.376, 95% CI: 1.181-4.779). Results of gene ontology (GO) analysis showed that all the significant GO Terms were correlated with cellular component of mitochondrion. In summary, a novel gene set that could predict prognosis of CN-AML was identified presently, which provided a new way to identify genes impacting AML chemo-sensitivity and prognosis.

  4. Association of a cytarabine chemosensitivity related gene expression signature with survival in cytogenetically normal acute myeloid leukemia

    PubMed Central

    Yan, Han; Wen, Lu; Tan, Dan; Xie, Pan; Pang, Feng-mei; Zhou, Hong-hao; Zhang, Wei; Liu, Zhao-qian; Tang, Jie; Li, Xi; Chen, Xiao-ping

    2017-01-01

    The prognosis of cytogenetically normal acute myeloid leukemia (CN-AML) varies greatly among patients. Achievement of complete remission (CR) after chemotherapy is indispensable for a better prognosis. To develop a gene signature predicting overall survival (OS) in CN-AML, we performed data mining procedure based on whole genome expression data of both blood cancer cell lines and AML patients from open access database. A gene expression signature including 42 probes was derived. These probes were significantly associated with both cytarabine half maximal inhibitory concentration values in blood cancer cell lines and OS in CN-AML patients. By using cox regression analysis and linear regression analysis, a chemo-sensitive score calculated algorithm based on mRNA expression levels of the 42 probes was established. The scores were associated with OS in both the training sample (p=5.13 × 10−4, HR=2.040, 95% CI: 1.364-3.051) and the validation sample (p=0.002, HR=2.528, 95% CI: 1.393-4.591) of the GSE12417 dataset from Gene Expression Omnibus. In The Cancer Genome Atlas (TCGA) CN-AML patients, higher scores were found to be associated with both worse OS (p=0.013, HR=2.442, 95% CI: 1.205-4.950) and DFS (p=0.015, HR=2.376, 95% CI: 1.181-4.779). Results of gene ontology (GO) analysis showed that all the significant GO Terms were correlated with cellular component of mitochondrion. In summary, a novel gene set that could predict prognosis of CN-AML was identified presently, which provided a new way to identify genes impacting AML chemo-sensitivity and prognosis. PMID:27903973

  5. Mutual information estimation reveals global associations between stimuli and biological processes

    PubMed Central

    Suzuki, Taiji; Sugiyama, Masashi; Kanamori, Takafumi; Sese, Jun

    2009-01-01

    Background Although microarray gene expression analysis has become popular, it remains difficult to interpret the biological changes caused by stimuli or variation of conditions. Clustering of genes and associating each group with biological functions are often used methods. However, such methods only detect partial changes within cell processes. Herein, we propose a method for discovering global changes within a cell by associating observed conditions of gene expression with gene functions. Results To elucidate the association, we introduce a novel feature selection method called Least-Squares Mutual Information (LSMI), which computes mutual information without density estimaion, and therefore LSMI can detect nonlinear associations within a cell. We demonstrate the effectiveness of LSMI through comparison with existing methods. The results of the application to yeast microarray datasets reveal that non-natural stimuli affect various biological processes, whereas others are no significant relation to specific cell processes. Furthermore, we discover that biological processes can be categorized into four types according to the responses of various stimuli: DNA/RNA metabolism, gene expression, protein metabolism, and protein localization. Conclusion We proposed a novel feature selection method called LSMI, and applied LSMI to mining the association between conditions of yeast and biological processes through microarray datasets. In fact, LSMI allows us to elucidate the global organization of cellular process control. PMID:19208155

  6. Gene prioritization and clustering by multi-view text mining

    PubMed Central

    2010-01-01

    Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336

  7. Markers of epithelial-to-mesenchymal transition reflect tumor biology according to patient age and Gleason score in prostate cancer

    PubMed Central

    Jędroszka, Dorota; Hamouz, Raneem; Górniak, Karolina; Bednarek, Andrzej K.

    2017-01-01

    Introduction Prostate carcinoma (PRAD) is one of the most frequently diagnosed malignancies amongst men worldwide. It is well-known that androgen receptor (AR) plays a pivotal role in a vast majority of prostate tumors. However, recent evidence emerged stating that estrogen receptors (ERs) may also contribute to prostate tumor development. Moreover, progression and aggressiveness of prostate cancer may be associated with differential expression genes of epithelial-to-mesenchymal transition (EMT). Therefore we aimed to assess the significance of receptors status as well as EMT marker genes expression among PRAD patients in accordance to their age and Gleason score. Materials and methods We analyzed TCGA gene expression profiles of 497 prostate tumor samples according to 43 genes involved in EMT and 3 hormone receptor genes (AR, ESR1, ESR2) as well as clinical characteristic of cancer patients. Then patients were divided into four groups according to their age and 5 groups according to Gleason score. Next, we evaluated PRAD samples according to relationship between the set of variables in different combinations and compared differential expression in subsequent groups of patients. The analysis was applied using R packages: FactoMineR, gplots, RColorBrewer and NMF. Results MFA analysis resulted in distinct grouping of PRAD patients into four age categories according to expression level of AR, ESR1 and ESR2 with the most distinct group of age less than 50 years old. Further investigations indicated opposite expression profiles of EMT markers between different age groups as well as strong association of EMT gene expression with Gleason score. We found that depending on age of prostate cancer patients and Gleason score EMT genes with distinctly altered expression are: KRT18, KRT19, MUC1 and COL4A1, CTNNB1, SNAI2, ZEB1 and MMP3. Conclusions Our major observation is that prostate cancer from patients under 50 years old compared to older ones has entirely different EMT gene expression profiles showing potentially more aggressive invasive phenotype, despite Gleason score classification. PMID:29206234

  8. Markers of epithelial-to-mesenchymal transition reflect tumor biology according to patient age and Gleason score in prostate cancer.

    PubMed

    Jędroszka, Dorota; Orzechowska, Magdalena; Hamouz, Raneem; Górniak, Karolina; Bednarek, Andrzej K

    2017-01-01

    Prostate carcinoma (PRAD) is one of the most frequently diagnosed malignancies amongst men worldwide. It is well-known that androgen receptor (AR) plays a pivotal role in a vast majority of prostate tumors. However, recent evidence emerged stating that estrogen receptors (ERs) may also contribute to prostate tumor development. Moreover, progression and aggressiveness of prostate cancer may be associated with differential expression genes of epithelial-to-mesenchymal transition (EMT). Therefore we aimed to assess the significance of receptors status as well as EMT marker genes expression among PRAD patients in accordance to their age and Gleason score. We analyzed TCGA gene expression profiles of 497 prostate tumor samples according to 43 genes involved in EMT and 3 hormone receptor genes (AR, ESR1, ESR2) as well as clinical characteristic of cancer patients. Then patients were divided into four groups according to their age and 5 groups according to Gleason score. Next, we evaluated PRAD samples according to relationship between the set of variables in different combinations and compared differential expression in subsequent groups of patients. The analysis was applied using R packages: FactoMineR, gplots, RColorBrewer and NMF. MFA analysis resulted in distinct grouping of PRAD patients into four age categories according to expression level of AR, ESR1 and ESR2 with the most distinct group of age less than 50 years old. Further investigations indicated opposite expression profiles of EMT markers between different age groups as well as strong association of EMT gene expression with Gleason score. We found that depending on age of prostate cancer patients and Gleason score EMT genes with distinctly altered expression are: KRT18, KRT19, MUC1 and COL4A1, CTNNB1, SNAI2, ZEB1 and MMP3. Our major observation is that prostate cancer from patients under 50 years old compared to older ones has entirely different EMT gene expression profiles showing potentially more aggressive invasive phenotype, despite Gleason score classification.

  9. Stress-Survival Gene Identification From an Acid Mine Drainage Algal Mat Community

    NASA Astrophysics Data System (ADS)

    Urbina-Navarrete, J.; Fujishima, K.; Paulino-Lima, I. G.; Rothschild-Mancinelli, B.; Rothschild, L. J.

    2014-12-01

    Microbial communities from acid mine drainage environments are exposed to multiple stressors to include low pH, high dissolved metal loads, seasonal freezing, and desiccation. The microbial and algal communities that inhabit these niche environments have evolved strategies that allow for their ecological success. Metagenomic analyses are useful in identifying species diversity, however they do not elucidate the mechanisms that allow for the resilience of a community under these extreme conditions. Many known or predicted genes encode for protein products that are unknown, or similarly, many proteins cannot be traced to their gene of origin. This investigation seeks to identify genes that are active in an algal consortium during stress from living in an acid mine drainage environment. Our approach involves using the entire community transcriptome for a functional screen in an Escherichia coli host. This approach directly targets the genes involved in survival, without need for characterizing the members of the consortium.The consortium was harvested and stressed with conditions similar to the native environment it was collected from. Exposure to low pH (< 3.2), high metal load, desiccation, and deep freeze resulted in the expression of stress-induced genes that were transcribed into messenger RNA (mRNA). These mRNA transcripts were harvested to build complementary DNA (cDNA) libraries in E. coli. The transformed E. coli were exposed to the same stressors as the original algal consortium to select for surviving cells. Successful cells incorporated the transcripts that encode survival mechanisms, thus allowing for selection and identification of the gene(s) involved. Initial selection screens for freeze and desiccation tolerance have yielded E. coli that are 1 order of magnitude more resistant to freezing (0.01% survival of control with no transcript, 0.2% survival of E. coli with transcript) and 3 orders of magnitude more resistant to desiccation (0.005% survival of control cells with no transcripts, 5% survival of cells with transcript).This work is transformative because genetic functions can be selected without having prior knowledge of the genes or of the organisms involved. Work continues to identify the genes responsible for tolerance to extreme conditions and the bio-mechanisms involved.

  10. Computational genomic analysis of PARK7 interactome reveals high BBS1 gene expression as a prognostic factor favoring survival in malignant pleural mesothelioma.

    PubMed

    Vavougios, Georgios D; Solenov, Evgeniy I; Hatzoglou, Chrissi; Baturina, Galina S; Katkova, Liubov E; Molyvdas, Paschalis Adam; Gourgoulianis, Konstantinos I; Zarogiannis, Sotirios G

    2015-10-01

    The aim of our study was to assess the differential gene expression of Parkinson protein 7 (PARK7) interactome in malignant pleural mesothelioma (MPM) using data mining techniques to identify novel candidate genes that may play a role in the pathogenicity of MPM. We constructed the PARK7 interactome using the ConsensusPathDB database. We then interrogated the Oncomine Cancer Microarray database using the Gordon Mesothelioma Study, for differential gene expression of the PARK7 interactome. In ConsensusPathDB, 38 protein interactors of PARK7 were identified. In the Gordon Mesothelioma Study, 34 of them were assessed out of which SUMO1, UBC3, KIAA0101, HDAC2, DAXX, RBBP4, BBS1, NONO, RBBP7, HTRA2, and STUB1 were significantly overexpressed whereas TRAF6 and MTA2 were significantly underexpressed in MPM patients (network 2). Furthermore, Kaplan-Meier analysis revealed that MPM patients with high BBS1 expression had a median overall survival of 16.5 vs. 8.7 mo of those that had low expression. For validation purposes, we performed a meta-analysis in Oncomine database in five sarcoma datasets. Eight network 2 genes (KIAA0101, HDAC2, SUMO1, RBBP4, NONO, RBBP7, HTRA2, and MTA2) were significantly differentially expressed in an array of 18 different sarcoma types. Finally, Gene Ontology annotation enrichment analysis revealed significant roles of the PARK7 interactome in NuRD, CHD, and SWI/SNF protein complexes. In conclusion, we identified 13 novel genes differentially expressed in MPM, never reported before. Among them, BBS1 emerged as a novel predictor of overall survival in MPM. Finally, we identified that PARK7 interactome is involved in novel pathways pertinent in MPM disease. Copyright © 2015 the American Physiological Society.

  11. Saponin determination, expression analysis and functional characterization of saponin biosynthetic genes in Chenopodium quinoa leaves.

    PubMed

    Fiallos-Jurado, Jennifer; Pollier, Jacob; Moses, Tessa; Arendt, Philipp; Barriga-Medina, Noelia; Morillo, Eduardo; Arahana, Venancio; de Lourdes Torres, Maria; Goossens, Alain; Leon-Reyes, Antonio

    2016-09-01

    Quinoa (Chenopodium quinoa Willd.) is a highly nutritious pseudocereal with an outstanding protein, vitamin, mineral and nutraceutical content. The leaves, flowers and seed coat of quinoa contain triterpenoid saponins, which impart bitterness to the grain and make them unpalatable without postharvest removal of the saponins. In this study, we quantified saponin content in quinoa leaves from Ecuadorian sweet and bitter genotypes and assessed the expression of saponin biosynthetic genes in leaf samples elicited with methyl jasmonate. We found saponin accumulation in leaves after MeJA treatment in both ecotypes tested. As no reference genes were available to perform qPCR in quinoa, we mined publicly available RNA-Seq data for orthologs of 22 genes known to be stably expressed in Arabidopsis thaliana using geNorm, NormFinder and BestKeeper algorithms. The quinoa ortholog of At2g28390 (Monensin Sensitivity 1, MON1) was stably expressed and chosen as a suitable reference gene for qPCR analysis. Candidate saponin biosynthesis genes were screened in the quinoa RNA-Seq data and subsequent functional characterization in yeast led to the identification of CqbAS1, CqCYP716A78 and CqCYP716A79. These genes were found to be induced by MeJA, suggesting this phytohormone might also modulate saponin biosynthesis in quinoa leaves. Knowledge of the saponin biosynthesis and its regulation in quinoa may aid the further development of sweet cultivars that do not require postharvest processing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. The Dynamics of Transcript Abundance during Cellularization of Developing Barley Endosperm1[OPEN

    PubMed Central

    Zhang, Runxuan; Burton, Rachel A; Shirley, Neil J.; Little, Alan; Morris, Jenny; Milne, Linda

    2016-01-01

    Within the cereal grain, the endosperm and its nutrient reserves are critical for successful germination and in the context of grain utilization. The identification of molecular determinants of early endosperm development, particularly regulators of cell division and cell wall deposition, would help predict end-use properties such as yield, quality, and nutritional value. Custom microarray data have been generated using RNA isolated from developing barley grain endosperm 3 d to 8 d after pollination (DAP). Comparisons of transcript abundance over time revealed 47 gene expression modules that can be clustered into 10 broad groups. Superimposing these modules upon cytological data allowed patterns of transcript abundance to be linked with key stages of early grain development. Here, attention was focused on how the datasets could be mined to explore and define the processes of cell wall biosynthesis, remodeling, and degradation. Using a combination of spatial molecular network and gene ontology enrichment analyses, it is shown that genes involved in cell wall metabolism are found in multiple modules, but cluster into two main groups that exhibit peak expression at 3 DAP to 4 DAP and 5 DAP to 8 DAP. The presence of transcription factor genes in these modules allowed candidate genes for the control of wall metabolism during early barley grain development to be identified. The data are publicly available through a dedicated web interface (https://ics.hutton.ac.uk/barseed/), where they can be used to interrogate co- and differential expression for any other genes, groups of genes, or transcription factors expressed during early endosperm development. PMID:26754666

  13. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    PubMed

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these interaction types and their associated gene-gene pairs uncovered many scientific insights. INO provides a novel approach for defining hierarchical interaction types and related keywords for literature mining. The ontology-based literature mining, in combination with an INO-based statistical interaction enrichment test, provides a new platform for efficient mining and analysis of topic-specific gene interaction networks.

  14. Microbial and geochemical assessment of bauxitic un-mined and post-mined chronosequence soils from Mocho Mountains, Jamaica.

    PubMed

    Lewis, Dawn E; Chauhan, Ashvini; White, John R; Overholt, Will; Green, Stefan J; Jasrotia, Puja; Wafula, Denis; Jagoe, Charles

    2012-10-01

    Microorganisms are very sensitive to environmental change and can be used to gauge anthropogenic impacts and even predict restoration success of degraded environments. Here, we report assessment of bauxite mining activities on soil biogeochemistry and microbial community structure using un-mined and three post-mined sites in Jamaica. The post-mined soils represent a chronosequence, undergoing restoration since 1987, 1997, and 2007. Soils were collected during dry and wet seasons and analyzed for pH, organic matter (OM), total carbon (TC), nitrogen (TN), and phosphorus. The microbial community structure was assessed through quantitative PCR and massively parallel bacterial ribosomal RNA (rRNA) gene sequencing. Edaphic factors and microbial community composition were analyzed using multivariate statistical approaches and revealed a significant, negative impact of mining on soil that persisted even after greater than 20 years of restoration. Seasonal fluctuations contributed to variation in measured soil properties and community composition, but they were minor in comparison to long-term effects of mining. In both seasons, post-mined soils were higher in pH but OM, TC, and TN decreased. Bacterial rRNA gene analyses demonstrated a general decrease in diversity in post-mined soils and up to a 3-log decrease in rRNA gene abundance. Community composition analyses demonstrated that bacteria from the Proteobacteria (α, β, γ, δ), Acidobacteria, and Firmicutes were abundant in all soils. The abundance of Firmicutes was elevated in newer post-mined soils relative to the un-mined soil, and this contrasted a decrease, relative to un-mined soils, in proteobacterial and acidobacterial rRNA gene abundances. Our study indicates long-lasting impacts of mining activities to soil biogeochemical and microbial properties with impending loss in soil productivity.

  15. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach.

    PubMed

    Yang, Xue-Dong; Tan, Hua-Wei; Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools.

  16. Insights into significant pathways and gene interaction networks underlying breast cancer cell line MCF-7 treated with 17β-estradiol (E2).

    PubMed

    Huan, Jinliang; Wang, Lishan; Xing, Li; Qin, Xianju; Feng, Lingbin; Pan, Xiaofeng; Zhu, Ling

    2014-01-01

    Estrogens are known to regulate the proliferation of breast cancer cells and to alter their cytoarchitectural and phenotypic properties, but the gene networks and pathways by which estrogenic hormones regulate these events are only partially understood. We used global gene expression profiling by Affymetrix GeneChip microarray analysis, with KEGG pathway enrichment, PPI network construction, module analysis and text mining methods to identify patterns and time courses of genes that are either stimulated or inhibited by estradiol (E2) in estrogen receptor (ER)-positive MCF-7 human breast cancer cells. Of the genes queried on the Affymetrix Human Genome U133 plus 2.0 microarray, we identified 628 (12h), 852 (24h) and 880 (48 h) differentially expressed genes (DEGs) that showed a robust pattern of regulation by E2. From pathway enrichment analysis, we found out the changes of metabolic pathways of E2 treated samples at each time point. At 12h time point, the changes of metabolic pathways were mainly focused on pathways in cancer, focal adhesion, and chemokine signaling pathway. At 24h time point, the changes were mainly enriched in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction and calcium signaling pathway. At 48 h time point, the significant pathways were pathways in cancer, regulation of actin cytoskeleton, cell adhesion molecules (CAMs), axon guidance and ErbB signaling pathway. Of interest, our PPI network analysis and module analysis found that E2 treatment induced enhancement of PRSS23 at the three time points and PRSS23 was in the central position of each module. Text mining results showed that the important genes of DEGs have relationship with signal pathways, such as ERbB pathway (AREG), Wnt pathway (NDP), MAPK pathway (NTRK3, TH), IP3 pathway (TRA@) and some transcript factors (TCF4, MAF). Our studies highlight the diverse gene networks and metabolic and cell regulatory pathways through which E2 operates to achieve its widespread effects on breast cancer cells. © 2013 Elsevier B.V. All rights reserved.

  17. Data Mining of Gene Arrays for Biomarkers of Survival in Ovarian Cancer

    PubMed Central

    Coveney, Clare; Boocock, David J.; Rees, Robert C.; Deen, Suha; Ball, Graham R.

    2015-01-01

    The expected five-year survival rate from a stage III ovarian cancer diagnosis is a mere 22%; this applies to the 7000 new cases diagnosed yearly in the UK. Stratification of patients with this heterogeneous disease, based on active molecular pathways, would aid a targeted treatment improving the prognosis for many cases. While hundreds of genes have been associated with ovarian cancer, few have yet been verified by peer research for clinical significance. Here, a meta-analysis approach was applied to two carefully selected gene expression microarray datasets. Artificial neural networks, Cox univariate survival analyses and T-tests identified genes whose expression was consistently and significantly associated with patient survival. The rigor of this experimental design increases confidence in the genes found to be of interest. A list of 56 genes were distilled from a potential 37,000 to be significantly related to survival in both datasets with a FDR of 1.39859 × 10−11, the identities of which both verify genes already implicated with this disease and provide novel genes and pathways to pursue. Further investigation and validation of these may lead to clinical insights and have potential to predict a patient’s response to treatment or be used as a novel target for therapy. PMID:27600227

  18. Embryonic exposure to an aqueous coal dust extract results in gene expression alterations associated with the development and function of connective tissue and the hematological system, immunological and inflammatory disease, and cancer in zebrafish.

    PubMed

    Caballero-Gallardo, Karina; Wirbisky-Hershberger, Sara E; Olivero-Verbel, Jesus; de la Rosa, Jesus; Freeman, Jennifer L

    2018-03-01

    Coal mining is one of the economic activities with the greatest impact on environmental quality. At all stages contaminants are released as particulates such as coal dust. The first aim of this study was to obtain an aqueous coal dust extract and characterize its composition in terms of trace elements by ICP-MS. In addition, the developmental toxicity of the aqueous coal extract was evaluated using zebrafish (Danio rerio) after exposure to different concentrations (0-1000 ppm; μg mL -1 ) to establish acute toxicity, morphology and transcriptome changes. Trace elements within the aqueous coal dust extract present at the highest concentrations (>10 ppb) included Sr, Zn, Ba, As, Cu and Se. In addition, Cd and Pb were found in lower concentrations. No significant difference in mortality was observed (p > 0.05), but a delay in hatching was found at 0.1 and 1000 ppm (p < 0.05). No significant differences in morphological characteristics were observed in any of the treatment groups (p > 0.05). Transcriptomic results of zebrafish larvae revealed alterations in 77, 61 and 1376 genes in the 1, 10, and 100 ppm groups, respectively. Gene ontology analysis identified gene alterations associated with the development and function of connective tissue and the hematological system, as well as pathways associated with apoptosis, the cell cycle, transcription, and oxidative stress including the MAPK signaling pathway. In addition, altered genes were associated with cancer; connective tissue, muscular, and skeletal disorders; and immunological and inflammatory diseases. Overall, this is the first study to characterize gene expression alterations in response to developmental exposure to aqueous coal dust residue from coal mining with transcriptome results signifying functions and systems to target in future studies.

  19. PromoterCAD: data-driven design of plant regulatory DNA

    PubMed Central

    Cox, Robert Sidney; Nishikata, Koro; Shimoyama, Sayoko; Yoshida, Yuko; Matsui, Minami; Makita, Yuko; Toyoda, Tetsuro

    2013-01-01

    Synthetic promoters can control the timing, location and amount of gene expression for any organism. PromoterCAD is a web application for designing synthetic promoters with altered transcriptional regulation. We use a data-first approach, using published high-throughput expression and motif data from for Arabidopsis thaliana to guide DNA design. We demonstrate data mining tools for finding motifs related to circadian oscillations and tissue-specific expression patterns. PromoterCAD is built on the LinkData open platform for data publication and rapid web application development, allowing new data to be easily added, and the source code modified to add new functionality. PromoterCAD URL: http://promotercad.org. LinkData URL: http://linkdata.org. PMID:23766287

  20. Simple Analysis of Deposited Gene Expression Datasets for the Non-Bioinformatician: How to Use GEO for Fibrosis Research.

    PubMed

    Guo, Yang; Townsend, Richard; Tsoi, Lam C

    2017-01-01

    In the past decade, high-throughput techniques have facilitated the "-omics" research. Transcriptomic study, for instance, has advanced our understanding on the expression landscape of different human diseases and cellular mechanisms. The National Center for Biotechnology Center (NCBI) initialized Genetic Expression Omnibus (GEO) to promote the sharing of transcriptomic data to facilitate biomedical research. In this chapter, we will illustrate how to use GEO to search and analyze the public available transcriptomic data, and we will provide easy to follow protocol for researchers to data mine the powerful resources in GEO to retrieve relevant information that can be valuable for fibrosis research.

  1. Gene Expression in Accumbens GABA Neurons from Inbred Rats with Different Drug-Taking Behavior

    PubMed Central

    Sharp, B.M.; Chen, H.; Gong, S.; Wu, X.; Liu, Z.; Hiler, K.; Taylor, W.L.; Matta, S.G.

    2011-01-01

    Inbred Lewis and Fisher 344 rat strains differ greatly in drug self-administration; Lewis rats operantly self-administer drugs of abuse including nicotine, whereas Fisher self-administer poorly. As shown herein, operant food self-administration is similar. Based on their pivotal role in drug reward, we hypothesized that differences in basal gene expression in GABAergic neurons projecting from nucleus accumbens (NAcc) to ventral pallidum (VP) play a role in vulnerability to drug taking behavior. The transcriptomes of NAcc shell-VP GABAergic neurons from these two strains were analyzed in adolescents, using a multidisciplinary approach that combined stereotaxic ionotophoretic brain microinjections, laser-capture microdissection (LCM) and microarray measurement of transcripts. LCM enriched the gene transcripts detected in GABA neurons compared to the residual NAcc tissue: a ratio of neuron/residual > 1 and false discovery rate (FDR) <5% yielded 6,623 transcripts, whereas a ratio of >3 yielded 3,514. Strain-dependent differences in gene expression within GABA neurons were identified; 322 vs. 60 transcripts showed 1.5-fold vs. 2-fold differences in expression (FDR<5%). Classification by gene ontology showed these 322 transcripts were widely distributed, without categorical enrichment. This is most consistent with a global change in GABA neuron function. Literature-mining by Chilibot found 38 genes related to synaptic plasticity, signaling and gene transcription, all of which determine drug-abuse; 33 genes have no known association with addiction or nicotine. In Lewis rats, upregulation of Mint-1, Cask, CamkIIδ, Ncam1, Vsnl1, Hpcal1 and Car8 indicates these transcripts likely contribute to altered signaling and synaptic function in NAcc GABA projection neurons to VP. PMID:21745336

  2. WGCNA: an R package for weighted correlation network analysis.

    PubMed

    Langfelder, Peter; Horvath, Steve

    2008-12-29

    Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA.

  3. WGCNA: an R package for weighted correlation network analysis

    PubMed Central

    Langfelder, Peter; Horvath, Steve

    2008-01-01

    Background Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. Results The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. Conclusion The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at . PMID:19114008

  4. Eucalyptus hairy roots, a fast, efficient and versatile tool to explore function and expression of genes involved in wood formation.

    PubMed

    Plasencia, Anna; Soler, Marçal; Dupas, Annabelle; Ladouce, Nathalie; Silva-Martins, Guilherme; Martinez, Yves; Lapierre, Catherine; Franche, Claudine; Truchet, Isabelle; Grima-Pettenati, Jacqueline

    2016-06-01

    Eucalyptus are of tremendous economic importance being the most planted hardwoods worldwide for pulp and paper, timber and bioenergy. The recent release of the Eucalyptus grandis genome sequence pointed out many new candidate genes potentially involved in secondary growth, wood formation or lineage-specific biosynthetic pathways. Their functional characterization is, however, hindered by the tedious, time-consuming and inefficient transformation systems available hitherto for eucalypts. To overcome this limitation, we developed a fast, reliable and efficient protocol to obtain and easily detect co-transformed E. grandis hairy roots using fluorescent markers, with an average efficiency of 62%. We set up conditions both to cultivate excised roots in vitro and to harden composite plants and verified that hairy root morphology and vascular system anatomy were similar to wild-type ones. We further demonstrated that co-transformed hairy roots are suitable for medium-throughput functional studies enabling, for instance, protein subcellular localization, gene expression patterns through RT-qPCR and promoter expression, as well as the modulation of endogenous gene expression. Down-regulation of the Eucalyptus cinnamoyl-CoA reductase1 (EgCCR1) gene, encoding a key enzyme in lignin biosynthesis, led to transgenic roots with reduced lignin levels and thinner cell walls. This gene was used as a proof of concept to demonstrate that the function of genes involved in secondary cell wall biosynthesis and wood formation can be elucidated in transgenic hairy roots using histochemical, transcriptomic and biochemical approaches. The method described here is timely because it will accelerate gene mining of the genome for both basic research and industry purposes. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  5. Genome mining-directed activation of a silent angucycline biosynthetic gene cluster in Streptomyces chattanoogensis.

    PubMed

    Zhou, Zhenxing; Xu, Qingqing; Bu, Qingting; Guo, Yuanyang; Liu, Shuiping; Liu, Yu; Du, Yiling; Li, Yongquan

    2015-02-09

    Genomic sequencing of actinomycetes has revealed the presence of numerous gene clusters seemingly capable of natural product biosynthesis, yet most clusters are cryptic under laboratory conditions. Bioinformatics analysis of the completely sequenced genome of Streptomyces chattanoogensis L10 (CGMCC 2644) revealed a silent angucycline biosynthetic gene cluster. The overexpression of a pathway-specific activator gene under the constitutive ermE* promoter successfully triggered the expression of the angucycline biosynthetic genes. Two novel members of the angucycline antibiotic family, chattamycins A and B, were further isolated and elucidated. Biological activity assays demonstrated that chattamycin B possesses good antitumor activities against human cancer cell lines and moderate antibacterial activities. The results presented here provide a feasible method to activate silent angucycline biosynthetic gene clusters to discover potential new drug leads. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  7. Exploring the molecular mechanisms of Traditional Chinese Medicine components using gene expression signatures and connectivity map.

    PubMed

    Yoo, Minjae; Shin, Jimin; Kim, Hyunmin; Kim, Jihye; Kang, Jaewoo; Tan, Aik Choon

    2018-04-04

    Traditional Chinese Medicine (TCM) has been practiced over thousands of years in China and other Asian countries for treating various symptoms and diseases. However, the underlying molecular mechanisms of TCM are poorly understood, partly due to the "multi-component, multi-target" nature of TCM. To uncover the molecular mechanisms of TCM, we perform comprehensive gene expression analysis using connectivity map. We interrogated gene expression signatures obtained 102 TCM components using the next generation Connectivity Map (CMap) resource. We performed systematic data mining and analysis on the mechanism of action (MoA) of these TCM components based on the CMap results. We clustered the 102 TCM components into four groups based on their MoAs using next generation CMap resource. We performed gene set enrichment analysis on these components to provide additional supports for explaining these molecular mechanisms. We also provided literature evidence to validate the MoAs identified through this bioinformatics analysis. Finally, we developed the Traditional Chinese Medicine Drug Repurposing Hub (TCM Hub) - a connectivity map resource to facilitate the elucidation of TCM MoA for drug repurposing research. TCMHub is freely available in http://tanlab.ucdenver.edu/TCMHub. Molecular mechanisms of TCM could be uncovered by using gene expression signatures and connectivity map. Through this analysis, we identified many of the TCM components possess diverse MoAs, this may explain the applications of TCM in treating various symptoms and diseases. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  8. Characterization of Withania somnifera Leaf Transcriptome and Expression Analysis of Pathogenesis – Related Genes during Salicylic Acid Signaling

    PubMed Central

    Ghosh Dasgupta, Modhumita; George, Blessan Santhosh; Bhatia, Anil; Sidhu, Om Prakash

    2014-01-01

    Withania somnifera (L.) Dunal is a valued medicinal plant with pharmaceutical applications. The present study was undertaken to analyze the salicylic acid induced leaf transcriptome of W. somnifera. A total of 45.6 million reads were generated and the de novo assembly yielded 73,523 transcript contig with average transcript contig length of 1620 bp. A total of 71,062 transcripts were annotated and 53,424 of them were assigned GO terms. Mapping of transcript contigs to biological pathways revealed presence of 182 pathways. Seventeen genes representing 12 pathogenesis-related (PR) families were mined from the transcriptome data and their pattern of expression post 17 and 36 hours of salicylic acid treatment was documented. The analysis revealed significant up-regulation of all families of PR genes by 36 hours post treatment except WsPR10. The relative fold expression of transcripts ranged from 1 fold to 6,532 fold. The two families of peroxidases including the lignin-forming anionic peroxidase (WsL-PRX) and suberization-associated anionic peroxidase (WsS-PRX) recorded maximum expression of 377 fold and 6532 fold respectively, while the expression of WsPR10 was down-regulated by 14 fold. Additionally, the most stable reference gene for normalization of qRT-PCR data was also identified. The effect of SA on the accumulation of major secondary metabolites of W. somnifera including withanoside V, withaferin A and withanolide A was also analyzed and an increase in content of all the three metabolites were detected. This is the first report on expression patterns of PR genes during salicylic acid signaling in W. somnifera. PMID:24739900

  9. Isolation and expression analysis of EcbZIP17 from different finger millet genotypes shows conserved nature of the gene.

    PubMed

    Chopperla, Ramakrishna; Singh, Sonam; Mohanty, Sasmita; Reddy, Nanja; Padaria, Jasdeep C; Solanke, Amolkumar U

    2017-10-01

    Basic leucine zipper (bZIP) transcription factors comprise one of the largest gene families in plants. They play a key role in almost every aspect of plant growth and development and also in biotic and abiotic stress tolerance. In this study, we report isolation and characterization of EcbZIP17 , a group B bZIP transcription factor from a climate smart cereal, finger millet ( Eleusine coracana L.). The genomic sequence of EcbZIP17 is 2662 bp long encompassing two exons and one intron with ORF of 1722 bp and peptide length of 573 aa. This gene is homologous to AtbZIP17 ( Arabidopsis ), ZmbZIP17 (maize) and OsbZIP60 (rice) which play a key role in endoplasmic reticulum (ER) stress pathway. In silico analysis confirmed the presence of basic leucine zipper (bZIP) and transmembrane (TM) domains in the EcbZIP17 protein. Allele mining of this gene in 16 different genotypes by Sanger sequencing revealed no variation in nucleotide sequence, including the 618 bp long intron. Expression analysis of EcbZIP17 under heat stress exhibited similar pattern of expression in all the genotypes across time intervals with highest upregulation after 4 h. The present study established the conserved nature of EcbZIP17 at nucleotide and expression level.

  10. Micro-evolution of toxicant tolerance: from single genes to the genome's tangled bank.

    PubMed

    van Straalen, Nico M; Janssens, Thierry K S; Roelofs, Dick

    2011-05-01

    Two case-studies published 55 years ago became textbook examples of evolution in action: DDT resistance in houseflies (Busvine) and the rise of melanic forms of the peppered moth (Kettlewell). Now, many years later, molecular studies have elucidated in detail the mechanisms conferring resistance. In this paper we focus on the case of metal tolerance in a soil-living arthropod, Orchesella cincta, and provide new evidence on the transcriptional regulation of a gene involved in stress tolerance, metallothionein. Evolution of resistance is often ascribed to cis-regulatory change of such stress-combatting genes. For example, DDT resistance in the housefly is due to insertion of a mobile element into the promoter of Cyp6g1, and overexpression of this gene allows rapid metabolism of DDT. The discovery of these mechanisms has promoted the idea that resistance to environmental toxicants can be brought about by relatively simple genetic changes, involving up-regulation, duplication or structural alteration of a single-gene. Similarly, the work on O. cincta shows that populations from metal-polluted mining sites have a higher constitutive expression of the cadmium-induced metallothionein (Mt) gene. Moreover, its promoter appears to include a large degree of polymorphism; Mt promoter alleles conferring high expression in cell-based bioreporter assays were shown to occur at higher frequency in populations living at polluted sites. The case is consistent with classical examples of micro-evolution through altered cis-regulation of a key gene. However, new data on qPCR analysis of gene expression in homozygous genotypes with both reference and metal-tolerant genetic backgrounds, show that Mt expression of the same pMt homozygotes depends on the origin of the population. This suggests that trans-acting factors are also important in the regulation of Mt expression and its evolution. So the idea that metal tolerance in Orchesella can be viewed as a single-gene adaptation must be abandoned. These data, added to a genome-wide gene expression profiling study reported earlier shows that evolution of tolerance takes place in a complicated molecular network, not unlike an internal tangled bank. © The Author(s) 2011. This article is published with open access at Springerlink.com

  11. Getting the most out of RNA-seq data analysis.

    PubMed

    Khang, Tsung Fei; Lau, Ching Yee

    2015-01-01

    Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.

  12. Gene Prioritization of Resistant Rice Gene against Xanthomas oryzae pv. oryzae by Using Text Mining Technologies

    PubMed Central

    Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu

    2013-01-01

    To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization. PMID:24371834

  13. Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies.

    PubMed

    Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu

    2013-01-01

    To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.

  14. Identification and characterization of plant-specific NAC gene family in canola (Brassica napus L.) reveal novel members involved in cell death.

    PubMed

    Wang, Boya; Guo, Xiaohua; Wang, Chen; Ma, Jieyu; Niu, Fangfang; Zhang, Hanfeng; Yang, Bo; Liang, Wanwan; Han, Feng; Jiang, Yuan-Qing

    2015-03-01

    NAC transcription factors are plant-specific and play important roles in plant development processes, response to biotic and abiotic cues and hormone signaling. However, to date, little is known about the NAC genes in canola (or oilseed rape, Brassica napus L.). In this study, a total of 60 NAC genes were identified from canola through a systematical analysis and mining of expressed sequence tags. Among these, the cDNA sequences of 41 NAC genes were successfully cloned. The translated protein sequences of canola NAC genes with the NAC genes from representative species were phylogenetically clustered into three major groups and multiple subgroups. The transcriptional activities of these BnaNAC proteins were assayed in yeast. In addition, by quantitative real-time RT-PCR, we further observed that some of these BnaNACs were regulated by different hormone stimuli or abiotic stresses. Interestingly, we successfully identified two novel BnaNACs, BnaNAC19 and BnaNAC82, which could elicit hypersensitive response-like cell death when expressed in Nicotiana benthamiana leaves, which was mediated by accumulation of reactive oxygen species. Overall, our work has laid a solid foundation for further characterization of this important NAC gene family in canola.

  15. Using the Saccharomyces Genome Database (SGD) for analysis of genomic information

    PubMed Central

    Skrzypek, Marek S.; Hirschman, Jodi

    2011-01-01

    Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large-scale datasets. PMID:21901739

  16. ENVIRONMENTAL EFFECTS ON SUPEROXIDE DISMUTASE AND CATALASE ACTIVITY AND EXPRESSION IN HONEY BEE.

    PubMed

    Nikolić, Tatjana V; Purać, Jelena; Orčić, Snežana; Kojić, Danijela; Vujanović, Dragana; Stanimirović, Zoran; Gržetić, Ivan; Ilijević, Konstantin; Šikoparija, Branko; Blagojević, Duško P

    2015-12-01

    Understanding the cellular stress response in honey bees will significantly contribute to their conservation. The aim of this study was to analyze the response of the antioxidative enzymes superoxide dismutase and catalase in honey bees related to the presence of toxic metals in different habitats. Three locations were selected: (i) Tunovo on the mountain Golija, as control area, without industry and large human impact, (ii) Belgrade as urban area, and (iii) Zajača, as mining and industrial zone. Our results showed that the concentrations of lead (Pb) in whole body of bees vary according to habitat, but there was very significant increase of Pb in bees from investigated industrial area. Bees from urban and industrial area had increased expression of both Sod1 and Cat genes, suggesting adaptation to increased oxidative stress. However, in spite increased gene expression, the enzyme activity of catalase was lower in bees from industrial area suggesting inhibitory effect of Pb on catalase. © 2015 Wiley Periodicals, Inc.

  17. Physiological impacts of acute Cu exposure on deep-sea vent mussel Bathymodiolus azoricus under a deep-sea mining activity scenario.

    PubMed

    Martins, Inês; Goulart, Joana; Martins, Eva; Morales-Román, Rosa; Marín, Sergio; Riou, Virginie; Colaço, Ana; Bettencourt, Raul

    2017-12-01

    Over the past years, several studies have been dedicated to understanding the physiological ability of the vent mussel Bathymodiolus azoricus to overcome the high metal concentrations present in their surrounding hydrothermal environment. Potential deep-sea mining activities at Azores Triple junction hydrothermal vent deposits would inevitably lead to the emergence of new fluid sources close to mussel beds, with consequent emission of high metal concentrations and potential resolubilization of Cu from minerals formed during the active phase of the vent field. Copper is an essential metal playing a key role in the activation of metalloenzymes and metalloproteins responsible for important cellular metabolic processes and tissue homeostasis. However, excessive intracellular amounts of reactive Cu ions may cause irreversible damages triggering possible cell apoptosis. In the present study, B. azoricus was exposed to increasing concentrations of Cu for 96h in conditions of temperature and hydrostatic pressure similar to those experienced at the Lucky Strike hydrothermal vent field. Specimens were kept in 1L flasks, exposed to four Cu concentrations: 0μg/L (control), 300, 800 and 1600μg/L and pressurized to 1750bar. We addressed the question of how increased Cu concentration would affect the function of antioxidant defense proteins and expression of antioxidant and immune-related genes in B. azoricus. Both antioxidant enzymatic activities and gene expression were examined in gills, mantle and digestive gland tissues of exposed vent mussels. Our study reveals that stressful short-term Cu exposure has a strong effect on molecular metabolism of the hydrothermal vent mussel, especially in gill tissue. Initially, both the stress caused by unpressurization or by Cu exposure was associated with high antioxidant enzyme activities and tissue-specific transcriptional up-regulation. However, mussels exposed to increased Cu concentrations showed both antioxidant and immune-related gene suppression. Under a mining activity scenario, the release of an excess of dissolved Cu to the vent environment may cause serious changes in cellular defense mechanisms of B. azoricus. This outcome, while adding to our knowledge of Cu toxicity, highlights the potentially deleterious impacts of mining activities on the physiology of deep-sea organisms. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Genomic Organization, Phylogenetic Comparison and Differential Expression of the SBP-Box Family Genes in Grape

    PubMed Central

    Hou, Hongmin; Li, Jun; Gao, Min; Singer, Stacy D.; Wang, Hao; Mao, Linyong; Fei, Zhangjun; Wang, Xiping

    2013-01-01

    Background The SBP-box gene family is specific to plants and encodes a class of zinc finger-containing transcription factors with a broad range of functions. Although SBP-box genes have been identified in numerous plants including green algae, moss, silver birch, snapdragon, Arabidopsis, rice and maize, there is little information concerning SBP-box genes, or the corresponding miR156/157, function in grapevine. Methodology/Principal Findings Eighteen SBP-box gene family members were identified in Vitis vinifera, twelve of which bore sequences that were complementary to miRNA156/157. Phylogenetic reconstruction demonstrated that plant SBP-domain proteins could be classified into seven subgroups, with the V. vinifera SBP-domain proteins being more closely related to SBP-domain proteins from dicotyledonous angiosperms than those from monocotyledonous angiosperms. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologs of several grape SBP genes were found in corresponding syntenic blocks of Arabidopsis. Expression analysis of the grape SBP-box genes in various organs and at different stages of fruit development in V. quinquangularis ‘Shang-24’ revealed distinct spatiotemporal patterns. While the majority of the grape SBP-box genes lacking a miR156/157 target site were expressed ubiquitously and constitutively, most genes bearing a miR156/157 target site exhibited distinct expression patterns, possibly due to the inhibitory role of the microRNA. Furthermore, microarray data mining and quantitative real-time RT-PCR analysis identified several grape SBP-box genes that are potentially involved in the defense against biotic and abiotic stresses. Conclusion The results presented here provide a further understanding of SBP-box gene function in plants, and yields additional insights into the mechanism of stress management in grape, which may have important implications for the future success of this crop. PMID:23527172

  19. Genome-wide analysis reveals TNFAIP8L2 as an immune checkpoint regulator of inflammation and metabolism.

    PubMed

    Li, Ting; Wang, Wei; Gong, Shunyou; Sun, Honghong; Zhang, Huqin; Yang, An-Gang; Chen, Youhai H; Li, Xinyuan

    2018-05-19

    The interplay between inflammation and metabolism is widely recognized, yet the underlying molecular mechanisms remain poorly characterized. Using experimental database mining and genome-wide gene expression profiling methods, we found that in contrast to other TNFAIP8 family members, TNFAIP8L2 (TIPE2) was preferentially expressed in human myeloid cell types. In addition, Tnfaip8l2 expression drastically decreased in lipopolysaccharide (LPS)-stimulated macrophages. Consequently, Tnfaip8l2 deficiency led to heightened expression of genes that were enriched for leukocyte activation and lipid biosynthesis pathways. Furthermore, mitochondrial respiration rate was increased in Tnfaip8l2-deficient macrophages, as measured by Seahorse metabolic analyzer. Taken together, these results indicate that Tnfaip8l2 serves as a "brake" for immunometabolism, which needs to be released for optimized metabolic reprogramming as well as mounting effective inflammatory responses. The unique anti-inflammatory and metabolic-modulatory function of TNFAIP8L2 renders it a novel therapeutic target for cardiovascular diseases and cancer. Copyright © 2018 Elsevier Ltd. All rights reserved.

  20. A-WINGS: an integrated genome database for Pleurocybella porrigens (Angel's wing oyster mushroom, Sugihiratake).

    PubMed

    Yamamoto, Naoki; Suzuki, Tomohiro; Kobayashi, Masaaki; Dohra, Hideo; Sasaki, Yohei; Hirai, Hirofumi; Yokoyama, Koji; Kawagishi, Hirokazu; Yano, Kentaro

    2014-12-03

    The angel's wing oyster mushroom (Pleurocybella porrigens, Sugihiratake) is a well-known delicacy. However, its potential risk in acute encephalopathy was recently revealed by a food poisoning incident. To disclose the genes underlying the accident and provide mechanistic insight, we seek to develop an information infrastructure containing omics data. In our previous work, we sequenced the genome and transcriptome using next-generation sequencing techniques. The next step in achieving our goal is to develop a web database to facilitate the efficient mining of large-scale omics data and identification of genes specifically expressed in the mushroom. This paper introduces a web database A-WINGS (http://bioinf.mind.meiji.ac.jp/a-wings/) that provides integrated genomic and transcriptomic information for the angel's wing oyster mushroom. The database contains structure and functional annotations of transcripts and gene expressions. Functional annotations contain information on homologous sequences from NCBI nr and UniProt, Gene Ontology, and KEGG Orthology. Digital gene expression profiles were derived from RNA sequencing (RNA-seq) analysis in the fruiting bodies and mycelia. The omics information stored in the database is freely accessible through interactive and graphical interfaces by search functions that include 'GO TREE VIEW' browsing, keyword searches, and BLAST searches. The A-WINGS database will accelerate omics studies on specific aspects of the angel's wing oyster mushroom and the family Tricholomataceae.

  1. Nuclear Receptor SHP Activates miR-206 Expression via a Cascade Dual Inhibitory Mechanism

    PubMed Central

    Song, Guisheng; Wang, Li

    2009-01-01

    MicroRNAs play a critical role in many essential cellular functions in the mammalian species. However, limited information is available regarding the regulation of miRNAs gene transcription. Microarray profiling and real-time PCR analysis revealed a marked down-regulation of miR-206 in nuclear receptor SHP−/− mice. To understand the regulatory function of SHP with regard to miR-206 gene expression, we determined the putative transcriptional initiation site of miR-206 and also its full length primary transcript using a database mining approach and RACE. We identified the transcription factor AP1 binding sites on the miR-206 promoter and further showed that AP1 (c-Jun and c-Fos) induced miR-206 promoter transactivity and expression which was repressed by YY1. ChIP analysis confirmed the physical association of AP1 (c-Jun) and YY1 with the endogenous miR-206 promoter. In addition, we also identified nuclear receptor ERRγ (NR3B3) binding site on the YY1 promoter and showed that YY1 promoter was transactivated by ERRγ, which was inhibited by SHP (NROB2). ChIP analysis confirmed the ERRγ binding to the YY1 promoter. Forced expression of SHP and AP1 induced miR-206 expression while overexpression of ERRγ and YY1 reduced its expression. The effects of AP1, ERRγ, and YY1 on miR-206 expression were reversed by siRNA knockdown of each gene, respectively. Thus, we propose a novel cascade “dual inhibitory” mechanism governing miR-206 gene transcription by SHP: SHP inhibition of ERRγ led to decreased YY1 expression and the de-repression of YY1 on AP1 activity, ultimately leading to the activation of miR-206. This is the first report to elucidate a cascade regulatory mechanism governing miRNAs gene transcription. PMID:19721712

  2. Mercury Contamination and Biogeochemical Cycling Associated with the Historic Idrija Mining Area of Slovenia

    NASA Astrophysics Data System (ADS)

    Hines, M. E.; Bonzongo, J. J.; Barkay, T.; Horvat, M.; Faganeli, J.

    2001-12-01

    The Idrija Mine is the second largest Hg mine in the world, which operated for 500 years before recently closing. More than five million tons of ore were mined with only 73% recovered. Hg-laden tailings still line the banks. Exhausts from stacks and mineshafts caused elevated levels of airborne Hg, most of which was deposited in the Idrija basin leading to elevated Hg levels in surficial soils. Hg is continually being transported downstream with approximately 1,500 kg per year entering the northern Adriatic Sea 100 km away. Multidisciplinary studies were conducted on samples collected throughout the Idrija and Soca River systems and waters and sediments in the Gulf of Trieste including Hg speciation, Hg transformation activities in sediments and soils, and the presence and expression of bacterial Hg resistance (mer) genes. Total Hg in the Idrija River increased from <3 to >300 ng/L with MeHg accounting for about 0.5%. Concentrations decreased downstream, but increased again in the Soca River and in the estuary with MeHg accounting for nearly 1.5% of the total. However, while bacteria upstream of the mine did not contain mer genes, such genes were detected in bacteria collected downstream for nearly 40 km, and these genes were transcribed. Total Hg levels decreased offshore, but values over 30 ng/L were noted in bottom waters. MeHg concentrations in the Gulf were highest in bottom waters. Sediments near the river mouth contained 40 micro-g/g total Hg with MeHg concentrations of about 3 ng/g. Sediments several km into the Gulf contained 50-fold less total Hg but only 10-fold less MeHg that decreased with depth in the sediment. Hg in sediment pore waters varied between 1 and 8 ng/L, with MeHg accounting for about 30%. Hg methylation and MeHg demethylation were active in Gulf sediments with highest activities near the surface. MeHg was degraded by an oxidative pathway with >97% of the C released from MeHg as carbon dioxide. Hg methylation depth profiles resembled profiles of dissolved MeHg. Despite the closure of the Idrija Mine, Hg-laden waters still strongly impact the riverine, estuarine, and marine systems. Organisms in the Idrija River responded to Hg stress, and high Hg levels persist into the Gulf. Increases in total Hg and MeHg in the estuary demonstrate the remobilization of Hg, presumably as HgS dissolution and recycling. Gulf sediments actively produce MeHg, which enters bottoms waters and the marine food chain.

  3. Mining of the Uncharacterized Cytochrome P450 Genes Involved in Alkaloid Biosynthesis in California Poppy Using a Draft Genome Sequence

    PubMed Central

    Hori, Kentaro; Yamada, Yasuyuki; Purwanto, Ratmoyo; Minakuchi, Yohei; Toyoda, Atsushi; Hirakawa, Hideki

    2018-01-01

    Abstract Land plants produce specialized low molecular weight metabolites to adapt to various environmental stressors, such as UV radiation, pathogen infection, wounding and animal feeding damage. Due to the large variety of stresses, plants produce various chemicals, particularly plant species-specific alkaloids, through specialized biosynthetic pathways. In this study, using a draft genome sequence and querying known biosynthetic cytochrome P450 (P450) enzyme-encoding genes, we characterized the P450 genes involved in benzylisoquinoline alkaloid (BIA) biosynthesis in California poppy (Eschscholzia californica), as P450s are key enzymes involved in the diversification of specialized metabolism. Our in silico studies showed that all identified enzyme-encoding genes involved in BIA biosynthesis were found in the draft genome sequence of approximately 489 Mb, which covered approximately 97% of the whole genome (502 Mb). Further analyses showed that some P450 families involved in BIA biosynthesis, i.e. the CYP80, CYP82 and CYP719 families, were more enriched in the genome of E. californica than in the genome of Arabidopsis thaliana, a plant that does not produce BIAs. CYP82 family genes were highly abundant, so we measured the expression of CYP82 genes with respect to alkaloid accumulation in different plant tissues and two cell lines whose BIA production differs to estimate the functions of the genes. Further characterization revealed two highly homologous P450s (CYP82P2 and CYP82P3) that exhibited 10-hydroxylase activities with different substrate specificities. Here, we discuss the evolution of the P450 genes and the potential for further genome mining of the genes encoding the enzymes involved in BIA biosynthesis. PMID:29301019

  4. Discovery and explanation of drug-drug interactions via text mining.

    PubMed

    Percha, Bethany; Garten, Yael; Altman, Russ B

    2012-01-01

    Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.

  5. Genome resequencing and transcriptome profiling reveal structural diversity and expression patterns of constitutive disease resistance genes in Huanglongbing-tolerant Poncirus trifoliata and its hybrids

    PubMed Central

    Rawat, Nidhi; Kumar, Brajendra; Albrecht, Ute; Du, Dongliang; Huang, Ming; Yu, Qibin; Zhang, Yi; Duan, Yong-Ping; Bowman, Kim D; Gmitter, Fred G; Deng, Zhanao

    2017-01-01

    Huanglongbing (HLB) is the most destructive bacterial disease of citrus worldwide. While most citrus varieties are susceptible to HLB, Poncirus trifoliata, a close relative of Citrus, and some of its hybrids with Citrus are tolerant to HLB. No specific HLB tolerance genes have been identified in P. trifoliata but recent studies have shown that constitutive disease resistance (CDR) genes were expressed at much higher levels in HLB-tolerant Poncirus hybrids and the expression of CDR genes was modulated by Candidatus Liberibacter asiaticus (CLas), the pathogen of HLB. The current study was undertaken to mine and characterize the CDR gene family in Citrus and Poncirus and to understand its association with HLB tolerance in Poncirus. We identified 17 CDR genes in two citrus genomes, deduced their structures, and investigated their phylogenetic relationships. We revealed that the expansion of the CDR family in Citrus seems to be due to segmental and tandem duplication events. Through genome resequencing and transcriptome sequencing, we identified eight CDR genes in the Poncirus genome (PtCDR1-PtCDR8). The number of SNPs was the highest in PtCDR2 and the lowest in PtCDR7. Most of the deletion and insertion events were observed in the UTR regions of Citrus and Poncirus CDR genes. PtCDR2 and PtCDR8 were in abundance in the leaf transcriptomes of two HLB-tolerant Poncirus genotypes and were also upregulated in HLB-tolerant, Poncirus hybrids as revealed by real-time PCR analysis. These two CDR genes seem to be good candidate genes for future studies of their role in citrus-CLas interactions. PMID:29152310

  6. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  7. Comparative and Evolutionary Analysis of Grass Pollen Allergens Using Brachypodium distachyon as a Model System.

    PubMed

    Sharma, Akanksha; Sharma, Niharika; Bhalla, Prem; Singh, Mohan

    2017-01-01

    Comparative genomics have facilitated the mining of biological information from a genome sequence, through the detection of similarities and differences with genomes of closely or more distantly related species. By using such comparative approaches, knowledge can be transferred from the model to non-model organisms and insights can be gained in the structural and evolutionary patterns of specific genes. In the absence of sequenced genomes for allergenic grasses, this study was aimed at understanding the structure, organisation and expression profiles of grass pollen allergens using the genomic data from Brachypodium distachyon as it is phylogenetically related to the allergenic grasses. Combining genomic data with the anther RNA-Seq dataset revealed 24 pollen allergen genes belonging to eight allergen groups mapping on the five chromosomes in B. distachyon. High levels of anther-specific expression profiles were observed for the 24 identified putative allergen-encoding genes in Brachypodium. The genomic evidence suggests that gene encoding the group 5 allergen, the most potent trigger of hay fever and allergic asthma originated as a pollen specific orphan gene in a common grass ancestor of Brachypodium and Triticiae clades. Gene structure analysis showed that the putative allergen-encoding genes in Brachypodium either lack or contain reduced number of introns. Promoter analysis of the identified Brachypodium genes revealed the presence of specific cis-regulatory sequences likely responsible for high anther/pollen-specific expression. With the identification of putative allergen-encoding genes in Brachypodium, this study has also described some important plant gene families (e.g. expansin superfamily, EF-Hand family, profilins etc) for the first time in the model plant Brachypodium. Altogether, the present study provides new insights into structural characterization and evolution of pollen allergens and will further serve as a base for their functional characterization in related grass species.

  8. Gene expression profiling in equine polysaccharide storage myopathy revealed inflammation, glycogenesis inhibition, hypoxia and mitochondrial dysfunctions.

    PubMed

    Barrey, Eric; Mucher, Elodie; Jeansoule, Nicolas; Larcher, Thibaut; Guigand, Lydie; Herszberg, Bérénice; Chaffaux, Stéphane; Guérin, Gérard; Mata, Xavier; Benech, Philippe; Canale, Marielle; Alibert, Olivier; Maltere, Péguy; Gidrol, Xavier

    2009-08-07

    Several cases of myopathies have been observed in the horse Norman Cob breed. Muscle histology examinations revealed that some families suffer from a polysaccharide storage myopathy (PSSM). It is assumed that a gene expression signature related to PSSM should be observed at the transcriptional level because the glycogen storage disease could also be linked to other dysfunctions in gene regulation. Thus, the functional genomic approach could be conducted in order to provide new knowledge about the metabolic disorders related to PSSM. We propose exploring the PSSM muscle fiber metabolic disorders by measuring gene expression in relationship with the histological phenotype. Genotypying analysis of GYS1 mutation revealed 2 homozygous (AA) and 5 heterozygous (GA) PSSM horses. In the PSSM muscles, histological data revealed PAS positive amylase resistant abnormal polysaccharides, inflammation, necrosis, and lipomatosis and active regeneration of fibers. Ultrastructural evaluation revealed a decrease of mitochondrial number and structural disorders. Extensive accumulation of an abnormal polysaccharide displaced and partially replaced mitochondria and myofibrils. The severity of the disease was higher in the two homozygous PSSM horses.Gene expression analysis revealed 129 genes significantly modulated (p < 0.05). The following genes were up-regulated over 2 fold: IL18, CTSS, LUM, CD44, FN1, GST01. The most down-regulated genes were the following: mitochondrial tRNA, SLC2A2, PRKCalpha, VEGFalpha. Data mining analysis showed that protein synthesis, apoptosis, cellular movement, growth and proliferation were the main cellular functions significantly associated with the modulated genes (p < 0.05). Several up-regulated genes, especially IL18, revealed a severe muscular inflammation in PSSM muscles. The up-regulation of glycogen synthase kinase-3 (GSK3beta) under its active form could be responsible for glycogen synthase (GYS1) inhibition and hypoxia-inducible factor (HIF1alpha) destabilization. The main disorders observed in PSSM muscles could be related to mitochondrial dysfunctions, glycogenesis inhibition and the chronic hypoxia of the PSSM muscles.

  9. Characterisation of secretory calcium-binding phosphoprotein-proline-glutamine-rich 1: a novel basal lamina component expressed at cell-tooth interfaces.

    PubMed

    Moffatt, Pierre; Wazen, Rima M; Dos Santos Neves, Juliana; Nanci, Antonio

    2014-12-01

    Functional genomic screening of the rat enamel organ (EO) has led to the identification of a number of secreted proteins expressed during the maturation stage of amelogenesis, including amelotin (AMTN) and odontogenic ameloblast-associated (ODAM). In this study, we characterise the gene, protein and pattern of expression of a related protein called secretory calcium-binding phosphoprotein-proline-glutamine-rich 1 (SCPPPQ1). The Scpppq1 gene resides within the secretory calcium-binding phosphoprotein (Scpp) cluster. SCPPPQ1 is a highly conserved, 75-residue, secreted protein rich in proline, leucine, glutamine and phenylalanine. In silico data mining has revealed no correlation to any known sequences. Northern blotting of various rat tissues suggests that the expression of Scpppq1 is restricted to tooth and associated tissues. Immunohistochemical analyses show that the protein is expressed during the late maturation stage of amelogenesis and in the junctional epithelium where it localises to an atypical basal lamina at the cell-tooth interface. This discrete localisation suggests that SCPPPQ1, together with AMTN and ODAM, participates in structuring the basal lamina and in mediating attachment of epithelia cells to mineralised tooth surfaces.

  10. Mining secreted proteins that function in pepper fruit development and ripening using a yeast secretion trap (YST)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Je Min, E-mail: jemin@knu.ac.kr; Department of Horticultural Science, Kyungpook National University, Daegu; Lee, Sang-Jik

    Highlights: • Yeast secretion trap (YST) is a valuable tool for mining secretome. • A total of 80 secreted proteins are newly identified via YST in pepper fruits. • The secreted proteins are differentially regulated during pepper development and ripening. • Transient GFP-fusion assay and in planta secretion trap can effectively validate the secretion of proteins. - Abstract: Plant cells secrete diverse sets of constitutively- and conditionally-expressed proteins under various environmental and developmental states. Secreted protein populations, or secretomes have multiple functions, including defense responses, signaling, metabolic processes, and developmental regulation. To identify genes encoding secreted proteins that function inmore » fruit development and ripening, a yeast secretion trap (YST) screen was employed using pepper (Capsicum annuum) fruit cDNAs. The YST screen revealed 80 pepper fruit-related genes (CaPFRs) encoding secreted proteins including cell wall proteins, several of which have not been previously described. Transient GFP-fusion assay and an in planta secretion trap were used to validate the secretion of proteins encoded by selected YST clones. In addition, RNA gel blot analyses provided further insights into their expression and regulation during fruit development and ripening. Integrating our data, we conclude that the YST provides a valuable functional genomics tool for the identification of substantial numbers of novel secreted plant proteins that are associated with biological processes, including fruit development and ripening.« less

  11. Application of dynamic topic models to toxicogenomics data.

    PubMed

    Lee, Mikyung; Liu, Zhichao; Huang, Ruili; Tong, Weida

    2016-10-06

    All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data. A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms. We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering hidden patterns embedded in time series gene expression profiles to gain enhanced understanding of dynamic behavior of gene regulation in the biological system.

  12. Mega-analysis of Odds Ratio: A Convergent Method for a Deep Understanding of the Genetic Evidence in Schizophrenia.

    PubMed

    Jia, Peilin; Chen, Xiangning; Xie, Wei; Kendler, Kenneth S; Zhao, Zhongming

    2018-06-20

    Numerous high-throughput omics studies have been conducted in schizophrenia, providing an accumulated catalog of susceptible variants and genes. The results from these studies, however, are highly heterogeneous. The variants and genes nominated by different omics studies often have limited overlap with each other. There is thus a pressing need for integrative analysis to unify the different types of data and provide a convergent view of schizophrenia candidate genes (SZgenes). In this study, we collected a comprehensive, multidimensional dataset, including 7819 brain-expressed genes. The data hosted genome-wide association evidence in genetics (eg, genotyping data, copy number variations, de novo mutations), epigenetics, transcriptomics, and literature mining. We developed a method named mega-analysis of odds ratio (MegaOR) to prioritize SZgenes. Application of MegaOR in the multidimensional data resulted in consensus sets of SZgenes (up to 530), each enriched with dense, multidimensional evidence. We proved that these SZgenes had highly tissue-specific expression in brain and nerve and had intensive interactions that were significantly stronger than chance expectation. Furthermore, we found these SZgenes were involved in human brain development by showing strong spatiotemporal expression patterns; these characteristics were replicated in independent brain expression datasets. Finally, we found the SZgenes were enriched in critical functional gene sets involved in neuronal activities, ligand gated ion signaling, and fragile X mental retardation protein targets. In summary, MegaOR analysis reported consensus sets of SZgenes with enriched association evidence to schizophrenia, providing insights into the pathophysiology underlying schizophrenia.

  13. MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.

    PubMed

    Lee, Sangseon; Park, Youngjune; Kim, Sun

    2017-07-15

    Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information. http://biohealth.snu.ac.kr/software/MIDAS/. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. A novel feature extraction approach for microarray data based on multi-algorithm fusion

    PubMed Central

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277

  15. A novel feature extraction approach for microarray data based on multi-algorithm fusion.

    PubMed

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.

  16. Selective inhibition of yeast regulons by daunorubicin: A transcriptome-wide analysis

    PubMed Central

    Rojas, Marta; Casado, Marta; Portugal, José; Piña, Benjamin

    2008-01-01

    Background The antitumor drug daunorubicin exerts some of its cytotoxic effects by binding to DNA and inhibiting the transcription of different genes. We analysed this effect in vivo at the transcriptome level using the budding yeast Saccharomyces cerevisiae as a model and sublethal (IC40) concentrations of the drug to minimise general toxic effects. Results Daunorubicin affected a minor proportion (14%) of the yeast transcriptome, increasing the expression of 195 genes and reducing expression of 280 genes. Daunorubicin down-regulated genes included essentially all genes involved in the glycolytic pathway, the tricarboxylic acid cycle and alcohol metabolism, whereas transcription of ribosomal protein genes was not affected or even slightly increased. This pattern is consistent with a specific inhibition of glucose usage in treated cells, with only minor effects on proliferation or other basic cell functions. Analysis of promoters of down-regulated genes showed that they belong to a limited number of transcriptional regulatory units (regulons). Consistently, data mining showed that daunorubicin-induced changes in expression patterns were similar to those observed in yeast strains deleted for some transcription factors functionally related to the glycolysis and/or the cAMP regulatory pathway, which appeared to be particularly sensitive to daunorubicin. Conclusion The effects of daunorubicin treatment on the yeast transcriptome are consistent with a model in which this drug impairs binding of different transcription factors by competing for their DNA binding sequences, therefore limiting their effectiveness and affecting the corresponding regulatory networks. This proposed mechanism might have broad therapeutic implications against cancer cells growing under hypoxic conditions. PMID:18667070

  17. Genetically Determined Susceptibility to Tuberculosis in Mice Causally Involves Accelerated and Enhanced Recruitment of Granulocytes

    PubMed Central

    Keller, Christine; Hoffmann, Reinhard; Lang, Roland; Brandau, Sven; Hermann, Corinna; Ehlers, Stefan

    2006-01-01

    Classical twin studies and recent linkage analyses of African populations have revealed a potential involvement of host genetic factors in susceptibility or resistance to Mycobacterium tuberculosis infection. In order to identify the candidate genes involved and test their causal implication, we capitalized on the mouse model of tuberculosis, since inbred mouse strains also differ substantially in their susceptibility to infection. Two susceptible and two resistant mouse strains were aerogenically infected with 1,000 CFU of M. tuberculosis, and the regulation of gene expression was examined by Affymetrix GeneChip U74A array with total lung RNA 2 and 4 weeks postinfection. Four weeks after infection, 96 genes, many of which are involved in inflammatory cell recruitment and activation, were regulated in common. One hundred seven genes were differentially regulated in susceptible mouse strains, whereas 43 genes were differentially expressed only in resistant mice. Data mining revealed a bias towards the expression of genes involved in granulocyte pathophysiology in susceptible mice, such as an upregulation of those for the neutrophil chemoattractant LIX (CXCL5), interleukin 17 receptor, phosphoinositide kinase 3 delta, or gamma interferon-inducible protein 10. Following M. tuberculosis challenge in both airways or peritoneum, granulocytes were recruited significantly faster and at higher numbers in susceptible than in resistant mice. When granulocytes were efficiently depleted by either of two regimens at the onset of infection, only susceptible mice survived aerosol challenge with M. tuberculosis significantly longer than control mice. We conclude that initially enhanced recruitment of granulocytes contributes to susceptibility to tuberculosis. PMID:16790804

  18. Expression and characterization of thermostable glycogen branching enzyme from Geobacillus mahadia Geo-05.

    PubMed

    Mohtar, Nur Syazwani; Abdul Rahman, Mohd Basyaruddin; Raja Abd Rahman, Raja Noor Zaliha; Leow, Thean Chor; Salleh, Abu Bakar; Mat Isa, Mohd Noor

    2016-01-01

    The glycogen branching enzyme (EC 2.4.1.18), which catalyses the formation of α -1,6-glycosidic branch points in glycogen structure, is often used to enhance the nutritional value and quality of food and beverages. In order to be applicable in industries, enzymes that are stable and active at high temperature are much desired. Using genome mining, the nucleotide sequence of the branching enzyme gene ( glgB ) was extracted from the Geobacillus mahadia Geo-05 genome sequence provided by the Malaysia Genome Institute. The size of the gene is 2013 bp, and the theoretical molecular weight of the protein is 78.43 kDa. The gene sequence was then used to predict the thermostability, function and the three dimensional structure of the enzyme. The gene was cloned and overexpressed in E. coli to verify the predicted result experimentally. The purified enzyme was used to study the effect of temperature and pH on enzyme activity and stability, and the inhibitory effect by metal ion on enzyme activity. This thermostable glycogen branching enzyme was found to be most active at 55 °C, and the half-life at 60 °C and 70 °C was 24 h and 5 h, respectively. From this research, a thermostable glycogen branching enzyme was successfully isolated from Geobacillus mahadia Geo-05 by genome mining together with molecular biology technique.

  19. Phylogenetics of Lophotrochozoan bHLH Genes and the Evolution of Lineage-Specific Gene Duplicates.

    PubMed

    Bao, Yongbo; Xu, Fei; Shimeld, Sebastian M

    2017-04-01

    The gain and loss of genes encoding transcription factors is of importance to understanding the evolution of gene regulatory complexity. The basic helix-loop-helix (bHLH) genes encode a large superfamily of transcription factors. We systematically classify the bHLH genes from five mollusc, two annelid and one brachiopod genomes, tracing the pattern of bHLH gene evolution across these poorly studied Phyla. In total, 56-88 bHLH genes were identified in each genome, with most identifiable as members of previously described bilaterian families, or of new families we define. Of such families only one, Mesp, appears lost by all these species. Additional duplications have also played a role in the evolution of the bHLH gene repertoire, with many new lophotrochozoan-, mollusc-, bivalve-, or gastropod-specific genes defined. Using a combination of transcriptome mining, RT-PCR, and in situ hybridization we compared the expression of several of these novel genes in tissues and embryos of the molluscs Crassostrea gigas and Patella vulgata, finding both conserved expression and evidence for neofunctionalization. We also map the positions of the genes across these genomes, identifying numerous gene linkages. Some reflect recent paralog divergence by tandem duplication, others are remnants of ancient tandem duplications dating to the lophotrochozoan or bilaterian common ancestors. These data are built into a model of the evolution of bHLH genes in molluscs, showing formidable evolutionary stasis at the family level but considerable within-family diversification by tandem gene duplication. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Biomedical hypothesis generation by text mining and gene prioritization.

    PubMed

    Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor

    2014-01-01

    Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.

  1. Biological classification with RNA-Seq data: Can alternatively spliced transcript expression enhance machine learning classifier?

    PubMed

    Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry

    2018-06-25

    The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  2. Discovery a novel organic solvent tolerant esterase from Salinispora arenicola CNP193 through genome mining.

    PubMed

    Fang, Yaowei; Wang, Shujun; Liu, Shu; Jiao, Yuliang

    2015-09-01

    An esterase gene, encoding a 325-amino-acid protein (SAestA), was mined form obligate marine actinomycete strain Salinispora arenicola CNP193 genome sequence. Phylogenetic analysis of the deduced amino acid sequence showed that the enzyme belonged to the family IV of lipolytic enzymes. The gene was cloned, expressed in Escherichia coli as a His-tagged protein, purified and characterized. The molecular weight of His-tagged SAestA is ∼38 kDa. SAestA-His6 was active in a temperature (5-40 °C) and pH range (7.0-11.0), and maximal activity was determined at pH 9.0 and 30 °C. The activity was severely inhibited by Hg(2+), Cu(2+), and Zn(2+). In particular, this enzyme showed remarkable stability in presence of organic solvents (25%, v/v) with log P>2.0 even after incubation for 7 days. All these characteristics suggested that SAestA may be a potential candidate for application in industrial processes in aqueous/organic media. Copyright © 2015. Published by Elsevier B.V.

  3. From 20th century metabolic wall charts to 21st century systems biology: database of mammalian metabolic enzymes

    PubMed Central

    Corcoran, Callan C.; Grady, Cameron R.; Pisitkun, Trairak; Parulekar, Jaya

    2017-01-01

    The organization of the mammalian genome into gene subsets corresponding to specific functional classes has provided key tools for systems biology research. Here, we have created a web-accessible resource called the Mammalian Metabolic Enzyme Database (https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymeDatabase.html) keyed to the biochemical reactions represented on iconic metabolic pathway wall charts created in the previous century. Overall, we have mapped 1,647 genes to these pathways, representing ~7 percent of the protein-coding genome. To illustrate the use of the database, we apply it to the area of kidney physiology. In so doing, we have created an additional database (Database of Metabolic Enzymes in Kidney Tubule Segments: https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/), mapping mRNA abundance measurements (mined from RNA-Seq studies) for all metabolic enzymes to each of 14 renal tubule segments. We carry out bioinformatics analysis of the enzyme expression pattern among renal tubule segments and mine various data sources to identify vasopressin-regulated metabolic enzymes in the renal collecting duct. PMID:27974320

  4. Genomics Portals: integrative web-platform for mining genomics data.

    PubMed

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  5. Genomics Portals: integrative web-platform for mining genomics data

    PubMed Central

    2010-01-01

    Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909

  6. Characterizing the Grape Transcriptome. Analysis of Expressed Sequence Tags from Multiple Vitis Species and Development of a Compendium of Gene Expression during Berry Development1[w

    PubMed Central

    Silva, Francisco Goes da; Iandolino, Alberto; Al-Kayal, Fadi; Bohlmann, Marlene C.; Cushman, Mary Ann; Lim, Hyunju; Ergul, Ali; Figueroa, Rubi; Kabuloglu, Elif K.; Osborne, Craig; Rowe, Joan; Tattersall, Elizabeth; Leslie, Anna; Xu, Jane; Baek, JongMin; Cramer, Grant R.; Cushman, John C.; Cook, Douglas R.

    2005-01-01

    We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of ≥98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation to previously characterized aspects of berry development and physiology. Comparison with published results for select genes, as well as correlation analysis between independent data sets, suggests that the inferred in silico patterns of expression are likely to be an accurate representation of transcript abundance for the conditions surveyed. Thus, the combined data set reveals the in silico expression patterns for hundreds of genes in V. vinifera, the majority of which have not been previously studied within this species. PMID:16219919

  7. Genome-Wide Distribution, Organisation and Functional Characterization of Disease Resistance and Defence Response Genes across Rice Species

    PubMed Central

    Singh, Sangeeta; Chand, Suresh; Singh, N. K.; Sharma, Tilak Raj

    2015-01-01

    The resistance (R) genes and defense response (DR) genes have become very important resources for the development of disease resistant cultivars. In the present investigation, genome-wide identification, expression, phylogenetic and synteny analysis was done for R and DR-genes across three species of rice viz: Oryza sativa ssp indica cv 93-11, Oryza sativa ssp japonica and wild rice species, Oryza brachyantha. We used the in silico approach to identify and map 786 R -genes and 167 DR-genes, 672 R-genes and 142 DR-genes, 251 R-genes and 86 DR-genes in the japonica, indica and O. brachyanth a genomes, respectively. Our analysis showed that 60.5% and 55.6% of the R-genes are tandemly repeated within clusters and distributed over all the rice chromosomes in indica and japonica genomes, respectively. The phylogenetic analysis along with motif distribution shows high degree of conservation of R- and DR-genes in clusters. In silico expression analysis of R-genes and DR-genes showed more than 85% were expressed genes showing corresponding EST matches in the databases. This study gave special emphasis on mechanisms of gene evolution and duplication for R and DR genes across species. Analysis of paralogs across rice species indicated 17% and 4.38% R-genes, 29% and 11.63% DR-genes duplication in indica and Oryza brachyantha, as compared to 20% and 26% duplication of R-genes and DR-genes in japonica respectively. We found that during the course of duplication only 9.5% of R- and DR-genes changed their function and rest of the genes have maintained their identity. Syntenic relationship across three genomes inferred that more orthology is shared between indica and japonica genomes as compared to brachyantha genome. Genome wide identification of R-genes and DR-genes in the rice genome will help in allele mining and functional validation of these genes, and to understand molecular mechanism of disease resistance and their evolution in rice and related species. PMID:25902056

  8. Saltatory Evolution of the Ectodermal Neural Cortex Gene Family at the Vertebrate Origin

    PubMed Central

    Feiner, Nathalie; Murakami, Yasunori; Breithut, Lisa; Mazan, Sylvie; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    The ectodermal neural cortex (ENC) gene family, whose members are implicated in neurogenesis, is part of the kelch repeat superfamily. To date, ENC genes have been identified only in osteichthyans, although other kelch repeat-containing genes are prevalent throughout bilaterians. The lack of elaborate molecular phylogenetic analysis with exhaustive taxon sampling has obscured the possible link of the establishment of this gene family with vertebrate novelties. In this study, we identified ENC homologs in diverse vertebrates by means of database mining and polymerase chain reaction screens. Our analysis revealed that the ENC3 ortholog was lost in the basal eutherian lineage through single-gene deletion and that the triplication between ENC1, -2, and -3 occurred early in vertebrate evolution. Including our original data on the catshark and the zebrafish, our comparison revealed high conservation of the pleiotropic expression pattern of ENC1 and shuffling of expression domains between ENC1, -2, and -3. Compared with many other gene families including developmental key regulators, the ENC gene family is unique in that conventional molecular phylogenetic inference could identify no obvious invertebrate ortholog. This suggests a composite nature of the vertebrate-specific gene repertoire, consisting not only of de novo genes introduced at the vertebrate origin but also of long-standing genes with no apparent invertebrate orthologs. Some of the latter, including the ENC gene family, may be too rapidly evolving to provide sufficient phylogenetic signals marking orthology to their invertebrate counterparts. Such gene families that experienced saltatory evolution likely remain to be explored and might also have contributed to phenotypic evolution of vertebrates. PMID:23843192

  9. The PhytoClust tool for metabolic gene clusters discovery in plant genomes

    PubMed Central

    Fuchs, Lisa-Maria

    2017-01-01

    Abstract The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. PMID:28486689

  10. The PhytoClust tool for metabolic gene clusters discovery in plant genomes.

    PubMed

    Töpfer, Nadine; Fuchs, Lisa-Maria; Aharoni, Asaph

    2017-07-07

    The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Apple EIN3 BINDING F-box 1 inhibits the activity of three apple EIN3-like transcription factors

    PubMed Central

    Tacken, Emma J.; Ireland, Hilary S.; Wang, Yen-Yi; Putterill, Jo; Schaffer, Robert J.

    2012-01-01

    Background and aims Fruit ripening in Malus× domestica (apple) is controlled by ethylene. Work in model species has shown that following the detection of ethylene, the ETHYLENE INSENSITIVE 3 (EIN3) transcription factor is stabilized, leading to an increase in transcript accumulation of ethylene-responsive genes, such as POLYGALACTURONASE1 (PG1). In the absence of ethylene, the EIN3 BINDING F-box (EBF) proteins rapidly degrade EIN3 via the ubiquitination/SCF (Skp, Cullin, F-Box) proteasome pathway. In this study, we aim to identify and characterize the apple EBF genes, and test their activity against apple EIN3-like proteins (EILs). Methodology The apple genome sequence was mined for EBF-like genes. The expression of EBF-like genes was measured during fruit development. Using a transient assay in Nicotiana benthamiana leaves, the activity of three apple EILs was tested against the PG1 promoter, with and without ethylene and EBF1. Principal results Four EBF-like genes in apple were identified and grouped into two sub-clades. Sub-clade I genes had constant expression over fruit development while sub-clade II genes increased in expression at ripening. EBF1 was shown to reduce the transactivation of the apple PG1 promoter by the EIL1, EIL2 and EIL3 transcription factors in the presence of ethylene. Conclusions The apple EBF1 gene identified here is likely to be a functionally conserved EBF orthologue, modulating EIL activity in apples. The activity of EBF1 suggests that it is not specific to a single EIL, instead acting as a global regulator of apple EIL transcription factors. PMID:23585922

  12. DISEASES: text mining and data integration of disease-gene associations.

    PubMed

    Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

    2015-03-01

    Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  13. A computational method for drug repositioning using publicly available gene expression data.

    PubMed

    Shabana, K M; Abdul Nazeer, K A; Pradhan, Meeta; Palakal, Mathew

    2015-01-01

    The identification of new therapeutic uses of existing drugs, or drug repositioning, offers the possibility of faster drug development, reduced risk, lesser cost and shorter paths to approval. The advent of high throughput microarray technology has enabled comprehensive monitoring of transcriptional response associated with various disease states and drug treatments. This data can be used to characterize disease and drug effects and thereby give a measure of the association between a given drug and a disease. Several computational methods have been proposed in the literature that make use of publicly available transcriptional data to reposition drugs against diseases. In this work, we carry out a data mining process using publicly available gene expression data sets associated with a few diseases and drugs, to identify the existing drugs that can be used to treat genes causing lung cancer and breast cancer. Three strong candidates for repurposing have been identified- Letrozole and GDC-0941 against lung cancer, and Ribavirin against breast cancer. Letrozole and GDC-0941 are drugs currently used in breast cancer treatment and Ribavirin is used in the treatment of Hepatitis C.

  14. [Strategies of elucidation of biosynthetic pathways of natural products].

    PubMed

    Zou, Li-Qiu; Kuang, Xue-Jun; Sun, Chao; Chen, Shi-Lin

    2016-11-01

    Elucidation of the biosynthetic pathways of natural products is not only the major goal of herb genomics, but also the solid foundation of synthetic biology of natural products. Here, this paper reviewed recent advance in this field and put forward strategies to elucidate the biosynthetic pathway of natural products. Firstly, a proposed biosynthetic pathway should be set up based on well-known knowledge about chemical reactions and information on the identified compounds, as well as studies with isotope tracer. Secondly, candidate genes possibly involved in the biosynthetic pathway were screened out by co-expression analysis and/or gene cluster mining. Lastly, all the candidate genes were heterologously expressed in the host and then the enzyme involved in the biosynthetic pathway was characterized by activity assay. Sometimes, the function of the enzyme in the original plant could be further studied by RNAi or VIGS technology. Understanding the biosynthetic pathways of natural products will contribute to supply of new leading compounds by synthetic biology and provide "functional marker" for herbal molecular breeding, thus but boosting the development of traditional Chinese medicine agriculture. Copyright© by the Chinese Pharmaceutical Association.

  15. Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system

    PubMed Central

    Sunkin, Susan M.; Ng, Lydia; Lau, Chris; Dolbeare, Tim; Gilbert, Terri L.; Thompson, Carol L.; Hawrylycz, Michael; Dang, Chinh

    2013-01-01

    The Allen Brain Atlas (http://www.brain-map.org) provides a unique online public resource integrating extensive gene expression data, connectivity data and neuroanatomical information with powerful search and viewing tools for the adult and developing brain in mouse, human and non-human primate. Here, we review the resources available at the Allen Brain Atlas, describing each product and data type [such as in situ hybridization (ISH) and supporting histology, microarray, RNA sequencing, reference atlases, projection mapping and magnetic resonance imaging]. In addition, standardized and unique features in the web applications are described that enable users to search and mine the various data sets. Features include both simple and sophisticated methods for gene searches, colorimetric and fluorescent ISH image viewers, graphical displays of ISH, microarray and RNA sequencing data, Brain Explorer software for 3D navigation of anatomy and gene expression, and an interactive reference atlas viewer. In addition, cross data set searches enable users to query multiple Allen Brain Atlas data sets simultaneously. All of the Allen Brain Atlas resources can be accessed through the Allen Brain Atlas data portal. PMID:23193282

  16. WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes.

    PubMed

    Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo

    2018-03-16

    Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.

  17. Informatics approaches in the Biological Characterization of ...

    EPA Pesticide Factsheets

    Adverse Outcome Pathways (AOPs) are a conceptual framework to characterize toxicity pathways by a series of mechanistic steps from a molecular initiating event to population outcomes. This framework helps to direct risk assessment research, for example by aiding in computational prioritization of chemicals, genes, and tissues relevant to an adverse health outcome. We have designed and implemented a computational workflow to access a wealth of public data relating genes, chemicals, diseases, pathways, and species, to provide a biological context for putative AOPs. We selected three AOP case studies: ER/Aromatase Antagonism Leading to Reproductive Dysfunction, AHR1 Activation Leading to Cardiotoxicity, and AChE Inhibition Leading to Acute Mortality, and deduced a taxonomic range of applicability for each AOP. We developed computational tools to automatically access and analyze the pathway activity of AOP-relevant protein orthologs, finding broad similarity among vertebrate species for the ER/Aromatase and AHR1 AOPs, and similarity extending to invertebrate animal species for AChE inhibition. Additionally, we used public gene expression data to find groups of highly co-expressed genes, and compared those groups across organisms. To interpret these findings at a higher level of biological organization, we created the AOPdb, a relational database that mines results from sources including NCBI, KEGG, Reactome, CTD, and OMIM. This multi-source database connects genes,

  18. Global Profiling of Rice and Poplar Transcriptomes Highlights Key Conserved Circadian-Controlled Pathways and cis-Regulatory Modules

    PubMed Central

    Filichkin, Sergei A.; Breton, Ghislain; Priest, Henry D.; Dharmawardhana, Palitha; Jaiswal, Pankaj; Fox, Samuel E.; Michael, Todd P.; Chory, Joanne; Kay, Steve A.; Mockler, Todd C.

    2011-01-01

    Background Circadian clocks provide an adaptive advantage through anticipation of daily and seasonal environmental changes. In plants, the central clock oscillator is regulated by several interlocking feedback loops. It was shown that a substantial proportion of the Arabidopsis genome cycles with phases of peak expression covering the entire day. Synchronized transcriptome cycling is driven through an extensive network of diurnal and clock-regulated transcription factors and their target cis-regulatory elements. Study of the cycling transcriptome in other plant species could thus help elucidate the similarities and differences and identify hubs of regulation common to monocot and dicot plants. Methodology/Principal Findings Using a combination of oligonucleotide microarrays and data mining pipelines, we examined daily rhythms in gene expression in one monocotyledonous and one dicotyledonous plant, rice and poplar, respectively. Cycling transcriptomes were interrogated under different diurnal (driven) and circadian (free running) light and temperature conditions. Collectively, photocycles and thermocycles regulated about 60% of the expressed nuclear genes in rice and poplar. Depending on the condition tested, up to one third of oscillating Arabidopsis-poplar-rice orthologs were phased within three hours of each other suggesting a high degree of conservation in terms of rhythmic gene expression. We identified clusters of rhythmically co-expressed genes and searched their promoter sequences to identify phase-specific cis-elements, including elements that were conserved in the promoters of Arabidopsis, poplar, and rice. Conclusions/Significance Our results show that the cycling patterns of many circadian clock genes are highly conserved across poplar, rice, and Arabidopsis. The expression of many orthologous genes in key metabolic and regulatory pathways is diurnal and/or circadian regulated and phased to similar times of day. Our results confirm previous findings in Arabidopsis of three major classes of cis-regulatory modules within the plant circadian network: the morning (ME, GBOX), evening (EE, GATA), and midnight (PBX/TBX/SBX) modules. Identification of identical overrepresented motifs in the promoters of cycling genes from different species suggests that the core diurnal/circadian cis-regulatory network is deeply conserved between mono- and dicotyledonous species. PMID:21694767

  19. Identification of learning and memory genes in canine; promoter investigation and determining the selective pressure.

    PubMed

    Seifi Moroudi, Reihane; Masoudi, Ali Akbar; Vaez Torshizi, Rasoul; Zandi, Mohammad

    2014-12-01

    One of the important behaviors of dogs is trainability which is affected by learning and memory genes. These kinds of the genes have not yet been identified in dogs. In the current research, these genes were found in animal models by mining the biological data and scientific literatures. The proteins of these genes were obtained from the UniProt database in dogs and humans. Not all homologous proteins perform similar functions, thus comparison of these proteins was studied in terms of protein families, domains, biological processes, molecular functions, and cellular location of metabolic pathways in Interpro, KEGG, Quick Go and Psort databases. The results showed that some of these proteins have the same performance in the rat or mouse, dog, and human. It is anticipated that the protein of these genes may be effective in learning and memory in dogs. Then, the expression pattern of the recognized genes was investigated in the dog hippocampus using the existing information in the GEO profile. The results showed that BDNF, TAC1 and CCK genes are expressed in the dog hippocampus, therefore, these genes could be strong candidates associated with learning and memory in dogs. Subsequently, due to the importance of the promoter regions in gene function, this region was investigated in the above genes. Analysis of the promoter indicated that the HNF-4 site of BDNF gene and the transcription start site of CCK gene is exposed to methylation. Phylogenetic analysis of protein sequences of these genes showed high similarity in each of these three genes among the studied species. The dN/dS ratio for BDNF, TAC1 and CCK genes indicates a purifying selection during the evolution of the genes.

  20. Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining.

    PubMed

    Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R

    2018-01-01

    The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.

  1. A systems biology approach to detect key pathways and interaction networks in gastric cancer on the basis of microarray analysis.

    PubMed

    Guo, Leilei; Song, Chunhua; Wang, Peng; Dai, Liping; Zhang, Jianying; Wang, Kaijuan

    2015-11-01

    The aim of the present study was to explore key molecular pathways contributing to gastric cancer (GC) and to construct an interaction network between significant pathways and potential biomarkers. Publicly available gene expression profiles of GSE29272 for GC, and data for the corresponding normal tissue, were downloaded from Gene Expression Omnibus. Pre‑processing and differential analysis were performed with R statistical software packages, and a number of differentially expressed genes (DEGs) were obtained. A functional enrichment analysis was performed for all the DEGs with a BiNGO plug‑in in Cytoscape. Their correlation was analyzed in order to construct a network. The modularity analysis and pathway identification operations were used to identify graph clusters and associated pathways. The underlying molecular mechanisms involving these DEGs were also assessed by data mining. A total of 249 DEGs, which were markedly upregulated and downregulated, were identified. The extracellular region contained the most significantly over‑represented functional terms, with respect to upregulated and downregulated genes, and the closest topological matches were identified for taste transduction and regulation of autophagy. In addition, extracellular matrix‑receptor interactions were identified as the most relevant pathway associated with the progression of GC. The genes for fibronectin 1, secreted phosphoprotein 1, collagen type 4 variant α‑1/2 and thrombospondin 1, which are involved in the pathways, may be considered as potential therapeutic targets for GC. A series of associations between candidate genes and key pathways were also identified for GC, and their correlation may provide novel insights into the pathogenesis of GC.

  2. hSAGEing: an improved SAGE-based software for identification of human tissue-specific or common tumor markers and suppressors.

    PubMed

    Yang, Cheng-Hong; Chuang, Li-Yeh; Shih, Tsung-Mu; Chang, Hsueh-Wei

    2010-12-17

    SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers. To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique "multi-pool method" that analyzes multiple pools of pair-wise case controls individually. When all the settings are in "inclusion", the common SAGE tag sequences are mined. When one tissue type is in "inclusion" and the other types of tissues are not in "inclusion", the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries. The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing.

  3. dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

    PubMed

    Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

    2017-06-02

    Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.

  4. Transcriptome Analysis of Invasive Plants in Response to Mineral Toxicity of Reclaimed Coal-Mine Soil in the Appalachian Region.

    PubMed

    Saminathan, Thangasamy; Malkaram, Sridhar A; Patel, Dharmesh; Taylor, Kaitlyn; Hass, Amir; Nimmakayala, Padma; Huber, David H; Reddy, Umesh K

    2015-09-01

    Efficient postmining reclamation requires successful revegetation. By using RNA sequencing, we evaluated the growth response of two invasive plants, goutweed (Aegopodium podagraria L.) and mugwort (Artemisia vulgaris), grown in two Appalachian acid-mine soils (MS-I and -II, pH ∼ 4.6). Although deficient in macronutrients, both soils contained high levels of plant-available Al, Fe and Mn. Both plant types showed toxicity tolerance, but metal accumulation differed by plant and site. With MS-I, Al accumulation was greater for mugwort than goutweed (385 ± 47 vs 2151 ± 251 μg g-1). Al concentration was similar between mine sites, but its accumulation in mugwort was greater with MS-I than MS-II, with no difference in accumulation by site for goutweed. An in situ approach revealed deregulation of multiple factors such as transporters, transcription factors, and metal chelators for metal uptake or exclusion. The two plant systems showed common gene expression patterns for different pathways. Both plant systems appeared to have few common heavy-metal pathway regulators addressing mineral toxicity/deficiency in both mine sites, which implies adaptability of invasive plants for efficient growth at mine sites with toxic waste. Functional genomics can be used to screen for plant adaptability, especially for reclamation and phytoremediation of contaminated soils and waters.

  5. oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language.

    PubMed

    Sanges, Remo; Cordero, Francesca; Calogero, Raffaele A

    2007-12-15

    OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package. This library provides a graphical interface (GUI) for Bioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays. Affymetrix 3' expression (IVT) arrays as well as the new whole transcript expression arrays, i.e. gene/exon 1.0 ST, are actually implemented. oneChannelGUI is available for most platforms on which R runs, i.e. Windows and Unix-like machines. http://www.bioconductor.org/packages/2.0/bioc/html/oneChannelGUI.html

  6. Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

    PubMed Central

    Király, András; Abonyi, János

    2014-01-01

    During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers. PMID:24616651

  7. Mining, identification and function analysis of microRNAs and target genes in peanut (Arachis hypogaea L.).

    PubMed

    Zhang, Tingting; Hu, Shuhao; Yan, Caixia; Li, Chunjuan; Zhao, Xiaobo; Wan, Shubo; Shan, Shihua

    2017-02-01

    In the present investigation, a total of 60 conserved peanut (Arachis hypogaea L.) microRNA (miRNA) sequences, belonging to 16 families, were identified using bioinformatics methods. There were 392 target gene sequences, identified from 58 miRNAs with Target-align software and BLASTx analyses. Gene Ontology (GO) functional analysis suggested that these target genes were involved in mediating peanut growth and development, signal transduction and stress resistance. There were 55 miRNA sequences, verified employing a poly (A) tailing test, with a success rate of up to 91.67%. Twenty peanut target gene sequences were randomly selected, and the 5' rapid amplification of the cDNA ends (5'-RACE) method were used to validate the cleavage sites of these target genes. Of these, 14 (70%) peanut miRNA targets were verified by means of gel electrophoresis, cloning and sequencing. Furthermore, functional analysis and homologous sequence retrieval were conducted for target gene sequences, and 26 target genes were chosen as the objects for stress resistance experimental study. Real-time fluorescence quantitative PCR (qRT-PCR) technology was applied to measure the expression level of resistance-associated miRNAs and their target genes in peanut exposed to Aspergillus flavus (A. flavus) infection and drought stress, respectively. In consequence, 5 groups of miRNAs & targets were found accorded with the mode of miRNA negatively controlling the expression of target genes. This study, preliminarily determined the biological functions of some resistance-associated miRNAs and their target genes in peanut. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  8. Candidate genes and pathogenesis investigation for sepsis-related acute respiratory distress syndrome based on gene expression profile.

    PubMed

    Wang, Min; Yan, Jingjun; He, Xingxing; Zhong, Qiang; Zhan, Chengye; Li, Shusheng

    2016-04-18

    Acute respiratory distress syndrome (ARDS) is a potentially devastating form of acute inflammatory lung injury as well as a major cause of acute respiratory failure. Although researchers have made significant progresses in elucidating the pathophysiology of this complex syndrome over the years, the absence of a universal detail disease mechanism up until now has led to a series of practical problems for a definitive treatment. This study aimed to predict some genes or pathways associated with sepsis-related ARDS based on a public microarray dataset and to further explore the molecular mechanism of ARDS. A total of 122 up-regulated DEGs and 91 down-regulated differentially expressed genes (DEGs) were obtained. The up- and down-regulated DEGs were mainly involved in functions like mitotic cell cycle and pathway like cell cycle. Protein-protein interaction network of ARDS analysis revealed 20 hub genes including cyclin B1 (CCNB1), cyclin B2 (CCNB2) and topoisomerase II alpha (TOP2A). A total of seven transcription factors including forkhead box protein M1 (FOXM1) and 30 target genes were revealed in the transcription factor-target gene regulation network. Furthermore, co-cited genes including CCNB2-CCNB1 were revealed in literature mining for the relations ARDS related genes. Pathways like mitotic cell cycle were closed related with the development of ARDS. Genes including CCNB1, CCNB2 and TOP2A, as well as transcription factors like FOXM1 might be used as the novel gene therapy targets for sepsis related ARDS.

  9. Mining Gene Expression Signature for the Detection of Pre-Malignant Melanocytes and Early Melanomas with Risk for Metastasis

    PubMed Central

    de Souza, Camila Ferreira; Xander, Patrícia; Monteiro, Ana Carolina; Silva, Amanda Gonçalves dos Santos; da Silva, Débora Castanheira Pereira; Mai, Sabine; Bernardo, Viviane; Lopes, José Daniel; Jasiulionis, Miriam Galvonas

    2012-01-01

    Background Metastatic melanoma is a highly aggressive skin cancer and currently resistant to systemic therapy. Melanomas may involve genetic, epigenetic and metabolic abnormalities. Evidence is emerging that epigenetic changes might play a significant role in tumor cell plasticity and metastatic phenotype of melanoma cells. Principal findings In this study, we developed a systematic approach to identify genes implicated in melanoma progression. To do this, we used the Affymetrix GeneChip Arrays to screen 34,000 mouse transcripts in melan-a melanocytes, 4C pre-malignant melanocytes, 4C11− non-metastatic and 4C11+ metastatic melanoma cell lines. The genome-wide association studies revealed pathways commonly over-represented in the transition from immortalized to pre-malignant stage, and under-represented in the transition from non-metastatic to metastatic stage. Additionally, the treatment of cells with 10 µM 5-aza-2′-deoxycytidine (5AzaCdR) for 48 hours allowed us to identify genes differentially re-expressed at specific stages of melan-a malignant transformation. Treatment of human primary melanocytes with the demethylating agent 5AzaCdR in combination to the histone deacetylase inhibitor Trichostatin A (TSA) revealed changes on melanocyte morphology and gene expression which could be an indicator of epigenetic flexibility in normal melanocytes. Moreover, changes on gene expression recognized by affecting the melanocyte biology (NDRG2 and VDR), phenotype of metastatic melanoma cells (HSPB1 and SERPINE1) and response to cancer therapy (CTCF, NSD1 and SRC) were found when Mel-2 and/or Mel-3-derived patient metastases were exposed to 5AzaCdR plus TSA treatment. Hierarchical clustering and network analyses in a panel of five patient-derived metastatic melanoma cells showed gene interactions that have never been described in melanomas. Significance Despite the heterogeneity observed in melanomas, this study demonstrates the utility of our murine melanoma progression model to identify molecular markers commonly perturbed in metastasis. Additionally, the novel gene expression signature identified here may be useful in the future into a model more closely related to translational research. PMID:22984562

  10. PlantAPA: A Portal for Visualization and Analysis of Alternative Polyadenylation in Plants

    PubMed Central

    Wu, Xiaohui; Zhang, Yumin; Li, Qingshun Q.

    2016-01-01

    Alternative polyadenylation (APA) is an important layer of gene regulation that produces mRNAs that have different 3′ ends and/or encode diverse protein isoforms. Up to 70% of annotated genes in plants undergo APA. Increasing numbers of poly(A) sites collected in various plant species demand new methods and tools to access and mine these data. We have created an open-access web service called PlantAPA (http://bmi.xmu.edu.cn/plantapa) to visualize and analyze genome-wide poly(A) sites in plants. PlantAPA provides various interactive and dynamic graphics and seamlessly integrates a genome browser that can profile heterogeneous cleavage sites and quantify expression patterns of poly(A) sites across different conditions. Particularly, through PlantAPA, users can analyze poly(A) sites in extended 3′ UTR regions, intergenic regions, and ambiguous regions owing to alternative transcription or RNA processing. In addition, it also provides tools for analyzing poly(A) site selections, 3′ UTR lengthening or shortening, non-canonical APA site switching, and differential gene expression between conditions, making it more powerful for the study of APA-mediated gene expression regulation. More importantly, PlantAPA offers a bioinformatics pipeline that allows users to upload their own short reads or ESTs for poly(A) site extraction, enabling users to further explore poly(A) site selection using stored PlantAPA poly(A) sites together with their own poly(A) site datasets. To date, PlantAPA hosts the largest database of APA sites in plants, including Oryza sativa, Arabidopsis thaliana, Medicago truncatula, and Chlamydomonas reinhardtii. As a user-friendly web service, PlantAPA will be a valuable addition to the community of biologists studying APA mechanisms and gene expression regulation in plants. PMID:27446120

  11. NanoString, a novel digital color-coded barcode technology: current and future applications in molecular diagnostics.

    PubMed

    Tsang, Hin-Fung; Xue, Vivian Weiwen; Koh, Su-Pin; Chiu, Ya-Ming; Ng, Lawrence Po-Wah; Wong, Sze-Chuen Cesar

    2017-01-01

    Formalin-fixed, paraffin-embedded (FFPE) tissue sample is a gold mine of resources for molecular diagnosis and retrospective clinical studies. Although molecular technologies have expanded the range of mutations identified in FFPE samples, the applications of existing technologies are limited by the low nucleic acids yield and poor extraction quality. As a result, the routine clinical applications of molecular diagnosis using FFPE samples has been associated with many practical challenges. NanoString technologies utilize a novel digital color-coded barcode technology based on direct multiplexed measurement of gene expression and offer high levels of precision and sensitivity. Each color-coded barcode is attached to a single target-specific probe corresponding to a single gene which can be individually counted without amplification. Therefore, NanoString is especially useful for measuring gene expression in degraded clinical specimens. Areas covered: This article describes the applications of NanoString technologies in molecular diagnostics and challenges associated with its applications and the future development. Expert commentary: Although NanoString technology is still in the early stages of clinical use, it is expected that NanoString-based cancer expression panels would play more important roles in the future in classifying cancer patients and in predicting the response to therapy for better personal therapeutic care.

  12. A Biologically-Based Computational Approach to Drug Repurposing for Anthrax Infection.

    PubMed

    Bai, Jane P F; Sakellaropoulos, Theodore; Alexopoulos, Leonidas G

    2017-03-10

    Developing drugs to treat the toxic effects of lethal toxin (LT) and edema toxin (ET) produced by B. anthracis is of global interest . We utilized a computational approach to score 474 drugs/compounds for their ability to reverse the toxic effects of anthrax toxins. For each toxin or drug/compound, we constructed an activity network by using its differentially expressed genes, molecular targets, and protein interactions. Gene expression profiles of drugs were obtained from the Connectivity Map and those of anthrax toxins in human alveolar macrophages were obtained from the Gene Expression Omnibus. Drug rankings were based on the ability of a drug/compound's mode of action in the form of a signaling network to reverse the effects of anthrax toxins; literature reports were used to verify the top 10 and bottom 10 drugs/compounds identified. Simvastatin and bepridil with reported in vitro potency for protecting cells from LT and ET toxicities were computationally ranked fourth and eighth. The other top 10 drugs were fenofibrate, dihydroergotamine, cotinine, amantadine, mephenytoin, sotalol, ifosfamide, and mefloquine; literature mining revealed their potential protective effects from LT and ET toxicities. These drugs are worthy of investigation for their therapeutic benefits and might be used in combination with antibiotics for treating B. anthracis infection.

  13. A Biologically-Based Computational Approach to Drug Repurposing for Anthrax Infection

    PubMed Central

    Bai, Jane P. F.; Sakellaropoulos, Theodore; Alexopoulos, Leonidas G.

    2017-01-01

    Developing drugs to treat the toxic effects of lethal toxin (LT) and edema toxin (ET) produced by B. anthracis is of global interest. We utilized a computational approach to score 474 drugs/compounds for their ability to reverse the toxic effects of anthrax toxins. For each toxin or drug/compound, we constructed an activity network by using its differentially expressed genes, molecular targets, and protein interactions. Gene expression profiles of drugs were obtained from the Connectivity Map and those of anthrax toxins in human alveolar macrophages were obtained from the Gene Expression Omnibus. Drug rankings were based on the ability of a drug/compound’s mode of action in the form of a signaling network to reverse the effects of anthrax toxins; literature reports were used to verify the top 10 and bottom 10 drugs/compounds identified. Simvastatin and bepridil with reported in vitro potency for protecting cells from LT and ET toxicities were computationally ranked fourth and eighth. The other top 10 drugs were fenofibrate, dihydroergotamine, cotinine, amantadine, mephenytoin, sotalol, ifosfamide, and mefloquine; literature mining revealed their potential protective effects from LT and ET toxicities. These drugs are worthy of investigation for their therapeutic benefits and might be used in combination with antibiotics for treating B. anthracis infection. PMID:28287432

  14. Analysis of cellulose synthase genes from domesticated apple identifies collinear genes WDR53 and CesA8A: partial co-expression, bicistronic mRNA, and alternative splicing of CESA8A

    PubMed Central

    Guerriero, Gea; Spadiut, Oliver; Kerschbamer, Christine; Giorno, Filomena; Baric, Sanja; Ezcurra, Inés

    2016-01-01

    Cellulose synthase (CesA) genes constitute a complex multigene family with six major phylogenetic clades in angiosperms. The recently sequenced genome of domestic apple, Malus×domestica, was mined for CesA genes, by blasting full-length cellulose synthase protein (CESA) sequences annotated in the apple genome against protein databases from the plant models Arabidopsis thaliana and Populus trichocarpa. Thirteen genes belonging to the six angiosperm CesA clades and coding for proteins with conserved residues typical of processive glycosyltransferases from family 2 were detected. Based on their phylogenetic relationship to Arabidopsis CESAs, as well as expression patterns, a nomenclature is proposed to facilitate further studies. Examination of their genomic organization revealed that MdCesA8-A is closely linked and co-oriented with WDR53, a gene coding for a WD40 repeat protein. The WDR53 and CesA8 genes display conserved collinearity in dicots and are partially co-expressed in the apple xylem. Interestingly, the presence of a bicistronic WDR53–CesA8A transcript was detected in phytoplasma-infected phloem tissues of apple. The bicistronic transcript contains a spliced intergenic sequence that is predicted to fold into hairpin structures typical of internal ribosome entry sites, suggesting its potential cap-independent translation. Surprisingly, the CesA8A cistron is alternatively spliced and lacks the zinc-binding domain. The possible roles of WDR53 and the alternatively spliced CESA8 variant during cellulose biosynthesis in M.×domestica are discussed. PMID:23048131

  15. Investigating the Control of Chlorophyll Degradation by Genomic Correlation Mining.

    PubMed

    Ghandchi, Frederick P; Caetano-Anolles, Gustavo; Clough, Steven J; Ort, Donald R

    2016-01-01

    Chlorophyll degradation is an intricate process that is critical in a variety of plant tissues at different times during the plant life cycle. Many of the photoactive chlorophyll degradation intermediates are exceptionally cytotoxic necessitating that the pathway be carefully coordinated and regulated. The primary regulatory step in the chlorophyll degradation pathway involves the enzyme pheophorbide a oxygenase (PAO), which oxidizes the chlorophyll intermediate pheophorbide a, that is eventually converted to non-fluorescent chlorophyll catabolites. There is evidence that PAO is differentially regulated across different environmental and developmental conditions with both transcriptional and post-transcriptional components, but the involved regulatory elements are uncertain or unknown. We hypothesized that transcription factors modulate PAO expression across different environmental conditions, such as cold and drought, as well as during developmental transitions to leaf senescence and maturation of green seeds. To test these hypotheses, several sets of Arabidopsis genomic and bioinformatic experiments were investigated and re-analyzed using computational approaches. PAO expression was compared across varied environmental conditions in the three separate datasets using regression modeling and correlation mining to identify gene elements co-expressed with PAO. Their functions were investigated as candidate upstream transcription factors or other regulatory elements that may regulate PAO expression. PAO transcript expression was found to be significantly up-regulated in warm conditions, during leaf senescence, and in drought conditions, and in all three conditions significantly positively correlated with expression of transcription factor Arabidopsis thaliana activating factor 1 (ATAF1), suggesting that ATAF1 is triggered in the plant response to these processes or abiotic stresses and in result up-regulates PAO expression. The proposed regulatory network includes the freezing, senescence, and drought stresses modulating factor ATAF1 and various other transcription factors and pathways, which in turn act to regulate chlorophyll degradation by up-regulating PAO expression.

  16. Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

    PubMed Central

    2011-01-01

    Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389

  17. Role of miR-452-5p in the tumorigenesis of prostate cancer: A study based on the Cancer Genome Atl(TCGA), Gene Expression Omnibus (GEO), and bioinformatics analysis.

    PubMed

    Gao, Li; Zhang, Li-Jie; Li, Sheng-Hua; Wei, Li-Li; Luo, Bin; He, Rong-Quan; Xia, Shuang

    2018-03-06

    MiR-452-5p has been reported to be down-regulated in prostate cancer, affecting the development of this type of cancer. However, the molecular mechanism of miR-452-5p in prostate cancer remains unclear. Therefore, we investigated the network of target genes of miR-452-5p in prostate cancer using bioinformatics analyses. We first analyzed the expression profiles and prognostic value of miR-452-5p in prostate cancer tissues from a public database. Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), PANTHER pathway analyses, and a disease ontology (DG) analysis were performed to find the molecular functions of the target genes from GSE datasets and miRWalk. Finally, we validated hub genes from the protein-protein interaction (PPI) networks of the target genes in the Human Protein Atlas (HPA) database and Gene Expression Profiling Interactive Analysis (GEPIA). Narrowing down the optimal target genes was conducted by seeking the common parts of up-regulated genes from GEPIA, down-regulated genes from GSE datasets, and predicted genes in miRWalk. Based on mining of GEO and ArrayExpress microarray chips and miRNA-Seq data in the TCGA database, which includes 1007 prostate cancer samples and 387 non-cancer samples, miR-452-5p is shown to be down-regulated in prostate cancer. GO, KEGG, and PANTHER pathway analyses suggested that the target genes might participate in important biological processes, such as transforming growth factor beta signaling and the positive regulation of brown fat cell differentiation and mesenchymal cell differentiation, as well as the Ras signaling pathway and pathways regulating the pluripotency of stem cells and arrhythmogenic right ventricular cardiomyopathy (ARVC). Nine genes-GABBR, PNISR, NTSR1, DOCK1, EREG, SFRP1, PTGS2, LEF1, and BMP2-were defined as hub genes in the PPI network. Three genes-FAM174B, SLC30A4, and SLIT1-were jointly shared by GEPIA, the GSE datasets, and miRWalk. Down-regulated miR-452-5p might play an essential role in the tumorigenesis of prostate cancer. Copyright © 2018. Published by Elsevier GmbH.

  18. Comprehensive analysis of alternative splicing and functionality in neuronal differentiation of P19 cells.

    PubMed

    Suzuki, Hitoshi; Osaki, Ken; Sano, Kaori; Alam, A H M Khurshid; Nakamura, Yuichiro; Ishigaki, Yasuhito; Kawahara, Kozo; Tsukahara, Toshifumi

    2011-02-18

    Alternative splicing, which produces multiple mRNAs from a single gene, occurs in most human genes and contributes to protein diversity. Many alternative isoforms are expressed in a spatio-temporal manner, and function in diverse processes, including in the neural system. The purpose of the present study was to comprehensively investigate neural-splicing using P19 cells. GeneChip Exon Array analysis was performed using total RNAs purified from cells during neuronal cell differentiation. To efficiently and readily extract the alternative exon candidates, 9 filtering conditions were prepared, yielding 262 candidate exons (236 genes). Semiquantitative RT-PCR results in 30 randomly selected candidates suggested that 87% of the candidates were differentially alternatively spliced in neuronal cells compared to undifferentiated cells. Gene ontology and pathway analyses suggested that many of the candidate genes were associated with neural events. Together with 66 genes whose functions in neural cells or organs were reported previously, 47 candidate genes were found to be linked to 189 events in the gene-level profile of neural differentiation. By text-mining for the alternative isoform, distinct functions of the isoforms of 9 candidate genes indicated by the result of Exon Array were confirmed. Alternative exons were successfully extracted. Results from the informatics analyses suggested that neural events were primarily governed by genes whose expression was increased and whose transcripts were differentially alternatively spliced in the neuronal cells. In addition to known functions in neural cells or organs, the uninvestigated alternative splicing events of 11 genes among 47 candidate genes suggested that cell cycle events are also potentially important. These genes may help researchers to differentiate the roles of alternative splicing in cell differentiation and cell proliferation.

  19. Mining gene link information for survival pathway hunting.

    PubMed

    Jing, Gao-Jian; Zhang, Zirui; Wang, Hong-Qiang; Zheng, Hong-Mei

    2015-08-01

    This study proposes a gene link-based method for survival time-related pathway hunting. In this method, the authors incorporate gene link information to estimate how a pathway is associated with cancer patient's survival time. Specifically, a gene link-based Cox proportional hazard model (Link-Cox) is established, in which two linked genes are considered together to represent a link variable and the association of the link with survival time is assessed using Cox proportional hazard model. On the basis of the Link-Cox model, the authors formulate a new statistic for measuring the association of a pathway with survival time of cancer patients, referred to as pathway survival score (PSS), by summarising survival significance over all the gene links in the pathway, and devise a permutation test to test the significance of an observed PSS. To evaluate the proposed method, the authors applied it to simulation data and two publicly available real-world gene expression data sets. Extensive comparisons with previous methods show the effectiveness and efficiency of the proposed method for survival pathway hunting.

  20. GarlicESTdb: an online database and mining tool for garlic EST sequences.

    PubMed

    Kim, Dae-Won; Jung, Tae-Sung; Nam, Seong-Hyeuk; Kwon, Hyuk-Ryul; Kim, Aeri; Chae, Sung-Hwa; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Park, Hong-Seog

    2009-05-18

    Allium sativum., commonly known as garlic, is a species in the onion genus (Allium), which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST) of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum) EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition) software technology (JSP/EJB/JavaServlet) for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP) and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation information for others to view. The GarlicESTdb web application is freely available at http://garlicdb.kribb.re.kr. GarlicESTdb is the first incorporated online information database of EST sequences isolated from garlic that can be freely accessed and downloaded. It has many useful features for interactive mining of EST contigs and datasets from each library, including curation of annotated information, expression profiling, information retrieval, and summary of statistics of functional annotation. Consequently, the development of GarlicESTdb will provide a crucial contribution to biologists for data-mining and more efficient experimental studies.

  1. StemTextSearch: Stem cell gene database with evidence from abstracts.

    PubMed

    Chen, Chou-Cheng; Ho, Chung-Liang

    2017-05-01

    Previous studies have used many methods to find biomarkers in stem cells, including text mining, experimental data and image storage. However, no text-mining methods have yet been developed which can identify whether a gene plays a positive or negative role in stem cells. StemTextSearch identifies the role of a gene in stem cells by using a text-mining method to find combinations of gene regulation, stem-cell regulation and cell processes in the same sentences of biomedical abstracts. The dataset includes 5797 genes, with 1534 genes having positive roles in stem cells, 1335 genes having negative roles, 1654 genes with both positive and negative roles, and 1274 with an uncertain role. The precision of gene role in StemTextSearch is 0.66, and the recall is 0.78. StemTextSearch is a web-based engine with queries that specify (i) gene, (ii) category of stem cell, (iii) gene role, (iv) gene regulation, (v) cell process, (vi) stem-cell regulation, and (vii) species. StemTextSearch is available through http://bio.yungyun.com.tw/StemTextSearch.aspx. Copyright © 2017. Published by Elsevier Inc.

  2. Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products.

    PubMed

    Kang, Hahk-Soo

    2017-02-01

    Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.

  3. Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining

    PubMed Central

    Kreula, Sanna M.; Kaewphan, Suwisa; Ginter, Filip

    2018-01-01

    The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from ‘reading the literature’. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already ‘known’, and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to (i) discover novel candidate associations between different genes or proteins in the network, and (ii) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource. PMID:29844966

  4. Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.

    PubMed

    Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin

    2017-02-21

    To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.

  5. Comprehensive QTL mapping survey dissects the complex fruit texture physiology in apple (Malus x domestica Borkh.).

    PubMed

    Longhi, Sara; Moretto, Marco; Viola, Roberto; Velasco, Riccardo; Costa, Fabrizio

    2012-02-01

    Fruit ripening is a complex physiological process in plants whereby cell wall programmed changes occur mainly to promote seed dispersal. Cell wall modification also directly regulates the textural properties, a fundamental aspect of fruit quality. In this study, two full-sib populations of apple, with 'Fuji' as the common maternal parent, crossed with 'Delearly' and 'Pink Lady', were used to understand the control of fruit texture by QTL mapping and in silico gene mining. Texture was dissected with a novel high resolution phenomics strategy, simultaneously profiling both mechanical and acoustic fruit texture components. In 'Fuji × Delearly' nine linkage groups were associated with QTLs accounting from 15.6% to 49% of the total variance, and a highly significant QTL cluster for both textural components was mapped on chromosome 10 and co-located with Md-PG1, a polygalacturonase gene that, in apple, is known to be involved in cell wall metabolism processes. In addition, other candidate genes related to Md-NOR and Md-RIN transcription factors, Md-Pel (pectate lyase), and Md-ACS1 were mapped within statistical intervals. In 'Fuji × Pink Lady', a smaller set of linkage groups associated with the QTLs identified for fruit texture (15.9-34.6% variance) was observed. The analysis of the phenotypic variance over a two-dimensional PCA plot highlighted a transgressive segregation for this progeny, revealing two QTL sets distinctively related to both mechanical and acoustic texture components. The mining of the apple genome allowed the discovery of the gene inventory underlying each QTL, and functional profile assessment unravelled specific gene expression patterns of these candidate genes.

  6. Genes Involved in the Evolution of Herbivory by a Leaf-Mining, Drosophilid Fly

    PubMed Central

    Whiteman, Noah K.; Gloss, Andrew D.; Sackton, Timothy B.; Groen, Simon C.; Humphrey, Parris T.; Lapoint, Richard T.; Sønderby, Ida E.; Halkier, Barbara A.; Kocks, Christine; Ausubel, Frederick M.; Pierce, Naomi E.

    2012-01-01

    Herbivorous insects are among the most successful radiations of life. However, we know little about the processes underpinning the evolution of herbivory. We examined the evolution of herbivory in the fly, Scaptomyza flava, whose larvae are leaf miners on species of Brassicaceae, including the widely studied reference plant, Arabidopsis thaliana (Arabidopsis). Scaptomyza flava is phylogenetically nested within the paraphyletic genus Drosophila, and the whole genome sequences available for 12 species of Drosophila facilitated phylogenetic analysis and assembly of a transcriptome for S. flava. A time-calibrated phylogeny indicated that leaf mining in Scaptomyza evolved between 6 and 16 million years ago. Feeding assays showed that biosynthesis of glucosinolates, the major class of antiherbivore chemical defense compounds in mustard leaves, was upregulated by S. flava larval feeding. The presence of glucosinolates in wild-type (WT) Arabidopsis plants reduced S. flava larval weight gain and increased egg–adult development time relative to flies reared in glucosinolate knockout (GKO) plants. An analysis of gene expression differences in 5-day-old larvae reared on WT versus GKO plants showed a total of 341 transcripts that were differentially regulated by glucosinolate uptake in larval S. flava. Of these, approximately a third corresponded to homologs of Drosophila melanogaster genes associated with starvation, dietary toxin-, heat-, oxidation-, and aging-related stress. The upregulated transcripts exhibited elevated rates of protein evolution compared with unregulated transcripts. The remaining differentially regulated transcripts also contained a higher proportion of novel genes than the unregulated transcripts. Thus, the transition to herbivory in Scaptomyza appears to be coupled with the evolution of novel genes and the co-option of conserved stress-related genes. PMID:22813779

  7. Internal controls for quantitative polymerase chain reaction of swine mammary glands during pregnancy and lactation.

    PubMed

    Tramontana, S; Bionaz, M; Sharma, A; Graugnard, D E; Cutler, E A; Ajmone-Marsan, P; Hurley, W L; Loor, J J

    2008-08-01

    High-throughput microarray analysis is an efficient means of obtaining a genome-wide view of transcript profiles across physiological states. However, quantitative PCR (qPCR) remains the chosen method for high-precision mRNA abundance analysis. Essential for reliability of qPCR data is normalization using appropriate internal control genes (ICG), which is now, more than ever before, a fundamental step for accurate gene expression profiling. We mined mammary tissue microarray data on >13,000 genes at -34, -14, 0, 7, 14, 21, and 28 d relative to parturition in 27 crossbred primiparous gilts to identify suitable ICG. Initial analysis revealed TBK1, PCSK2, PTBP1, API5, VAPB, QTRT1, TRIM41, TMEM24, PPP2R5B, and AP1S1 as the most stable genes (sample/reference = 1 +/- 0.2). We also included 9 genes previously identified as ICG in bovine mammary tissue. Gene network analysis of the 19 genes identified AP1S1, API5, MTG1, VAPB, TRIM41, MRPL39, and RPS15A as having no known co-regulation. In addition, UXT and ACTB were added to this list, and mRNA abundance of these 9 genes was measured by qPCR. Expression of all 9 of these genes was decreased markedly during lactation. In a previous study with bovine mammary tissue, mRNA of stably expressed genes decreased during lactation due to a dilution effect brought about by large increases in expression of highly abundant genes. To verify this effect, highly abundant mammary genes such as CSN1S2, SCD, FABP3, and LTF were evaluated by qPCR. The tested ICG had a negative correlation with these genes, demonstrating a dilution effect in the porcine mammary tissue. Gene stability analysis identified API5, VABP, and MRPL39 as the most stable ICG in porcine mammary tissue and indicated that the use of those 3 genes was most appropriate for calculating a normalization factor. Overall, results underscore the importance of proper validation of internal controls for qPCR and highlight the limitations of using absence of time effects as the criteria for selection of appropriate ICG. Further, we showed that use of the same ICG from one organism might not be suitable for qPCR normalization in other species.

  8. A cross-species analysis method to analyze animal models' similarity to human's disease state

    PubMed Central

    2012-01-01

    Background Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism. Results We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery. We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning. Conclusions We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology. PMID:23282076

  9. A cross-species analysis method to analyze animal models' similarity to human's disease state.

    PubMed

    Yu, Shuhao; Zheng, Lulu; Li, Yun; Li, Chunyan; Ma, Chenchen; Li, Yixue; Li, Xuan; Hao, Pei

    2012-01-01

    Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism. We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery. We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning. We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology.

  10. Comparative and Evolutionary Analysis of Grass Pollen Allergens Using Brachypodium distachyon as a Model System

    PubMed Central

    Sharma, Akanksha; Sharma, Niharika; Bhalla, Prem; Singh, Mohan

    2017-01-01

    Comparative genomics have facilitated the mining of biological information from a genome sequence, through the detection of similarities and differences with genomes of closely or more distantly related species. By using such comparative approaches, knowledge can be transferred from the model to non-model organisms and insights can be gained in the structural and evolutionary patterns of specific genes. In the absence of sequenced genomes for allergenic grasses, this study was aimed at understanding the structure, organisation and expression profiles of grass pollen allergens using the genomic data from Brachypodium distachyon as it is phylogenetically related to the allergenic grasses. Combining genomic data with the anther RNA-Seq dataset revealed 24 pollen allergen genes belonging to eight allergen groups mapping on the five chromosomes in B. distachyon. High levels of anther-specific expression profiles were observed for the 24 identified putative allergen-encoding genes in Brachypodium. The genomic evidence suggests that gene encoding the group 5 allergen, the most potent trigger of hay fever and allergic asthma originated as a pollen specific orphan gene in a common grass ancestor of Brachypodium and Triticiae clades. Gene structure analysis showed that the putative allergen-encoding genes in Brachypodium either lack or contain reduced number of introns. Promoter analysis of the identified Brachypodium genes revealed the presence of specific cis-regulatory sequences likely responsible for high anther/pollen-specific expression. With the identification of putative allergen-encoding genes in Brachypodium, this study has also described some important plant gene families (e.g. expansin superfamily, EF-Hand family, profilins etc) for the first time in the model plant Brachypodium. Altogether, the present study provides new insights into structural characterization and evolution of pollen allergens and will further serve as a base for their functional characterization in related grass species. PMID:28103252

  11. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    PubMed

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Gene expression ratio stability evaluation in prepubertal bovine mammary tissue from calves fed different milk replacers reveals novel internal controls for quantitative polymerase chain reaction.

    PubMed

    Piantoni, Paola; Bionaz, Massimo; Graugnard, Daniel E; Daniels, Kristy M; Akers, R Michael; Loor, Juan J

    2008-06-01

    Prepubertal mammary development can be affected by nutrition partly through alterations in gene network expression. Quantitative PCR (qPCR) remains the most accurate method to measure mRNA expression but is subject to analytical errors that introduce variation. Thus, qPCR data normalization through the use of internal control genes (ICG) is required. The objective of this study was to mine microarray data (> 10,000 genes) from prepubertal mammary parenchyma and stroma to identify the most suitable ICG for normalization of qPCR. Tissue for RNA extraction was obtained from calves ( approximately 63 d old; n = 5/diet) fed a control (200 g/kg crude protein, 210 g/kg crude fat, fed at 441 g/d dry matter) or a high-protein milk replacer (280 g/kg crude protein, 200 g/kg crude fat, fed at 951 g/d dry matter). ICG were selected based on both absence of expression variation across treatment and of coregulation (gene network analysis). Genes evaluated were ubiquitously expressed transcript, protein phosphatase 1 regulatory (inhibitor) subunit 11 (PPP1R11), matrix metallopeptidase 14 (MMP14), ClpB caseinolytic peptidase B, SAPS domain family member 1 (SAPS1), mitochondrial GTPase 1 (MTG1), mitochondrial ribosomal protein L39, ribosomal protein S15a (RPS15A), and actin beta (ACTB). Network analysis demonstrated that MMP14 and ACTB are coregulated by v-myc myelocytomatosis viral oncogene, tumor protein p53, and potentially insulin-like growth factor 1. Pairwise comparison of expression ratios showed that ACTB, MMP14, and SAPS1 had the lowest stability and were unsuitable as ICG. PPP1R11, RPS15A, and MTG1 were the most stable among ICG tested. We conclude that the geometric mean of PPP1R11, RPS15A, and MTG1 is ideal for normalization of qPCR data in prepubertal bovine mammary tissue. This study provides a list of candidate ICG that could be used by researchers working in bovine mammary development and allied fields.

  13. Gene gymnastics

    PubMed Central

    Vijayachandran, Lakshmi S; Thimiri Govinda Raj, Deepak B; Edelweiss, Evelina; Gupta, Kapil; Maier, Josef; Gordeliy, Valentin; Fitzgerald, Daniel J; Berger, Imre

    2013-01-01

    Most essential activities in eukaryotic cells are catalyzed by large multiprotein assemblies containing up to ten or more interlocking subunits. The vast majority of these protein complexes are not easily accessible for high resolution studies aimed at unlocking their mechanisms, due to their low cellular abundance and high heterogeneity. Recombinant overproduction can resolve this bottleneck and baculovirus expression vector systems (BEVS) have emerged as particularly powerful tools for the provision of eukaryotic multiprotein complexes in high quality and quantity. Recently, synthetic biology approaches have begun to make their mark in improving existing BEVS reagents by de novo design of streamlined transfer plasmids and by engineering the baculovirus genome. Here we present OmniBac, comprising new custom designed reagents that further facilitate the integration of heterologous genes into the baculovirus genome for multiprotein expression. Based on comparative genome analysis and data mining, we herein present a blueprint to custom design and engineer the entire baculovirus genome for optimized production properties using a bottom-up synthetic biology approach. PMID:23328086

  14. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Gene network and pathway analysis of mice with conditional ablation of Dicer in post-mitotic neurons.

    PubMed

    Dorval, Véronique; Smith, Pascal Y; Delay, Charlotte; Calvo, Ezequiel; Planel, Emmanuel; Zommer, Nadège; Buée, Luc; Hébert, Sébastien S

    2012-01-01

    The small non-protein-coding microRNAs (miRNAs) have emerged as critical regulators of neuronal differentiation, identity and survival. To date, however, little is known about the genes and molecular networks regulated by neuronal miRNAs in vivo, particularly in the adult mammalian brain. We analyzed whole genome microarrays from mice lacking Dicer, the enzyme responsible for miRNA production, specifically in postnatal forebrain neurons. A total of 755 mRNA transcripts were significantly (P<0.05, FDR<0.25) misregulated in the conditional Dicer knockout mice. Ten genes, including Tnrc6c, Dnmt3a, and Limk1, were validated by real time quantitative RT-PCR. Upregulated transcripts were enriched in nonneuronal genes, which is consistent with previous studies in vitro. Microarray data mining showed that upregulated genes were enriched in biological processes related to gene expression regulation, while downregulated genes were associated with neuronal functions. Molecular pathways associated with neurological disorders, cellular organization and cellular maintenance were altered in the Dicer mutant mice. Numerous miRNA target sites were enriched in the 3'untranslated region (3'UTR) of upregulated genes, the most significant corresponding to the miR-124 seed sequence. Interestingly, our results suggest that, in addition to miR-124, a large fraction of the neuronal miRNome participates, by order of abundance, in coordinated gene expression regulation and neuronal maintenance. Taken together, these results provide new clues into the role of specific miRNA pathways in the regulation of brain identity and maintenance in adult mice.

  16. MicroRNA-218 functions as a tumor suppressor in lung cancer by targeting IL-6/STAT3 and negatively correlates with poor prognosis.

    PubMed

    Yang, Yan; Ding, Lili; Hu, Qun; Xia, Jia; Sun, Junjie; Wang, Xudong; Xiong, Hua; Gurbani, Deepak; Li, Lianbo; Liu, Yan; Liu, Aiguo

    2017-08-22

    Aberrant expression of microRNAs in different human cancer types has been widely reported. MiR-218 acts as a tumor suppressor in diverse human cancer types impacting regulation of multiple genes in oncogenic pathways. Here, we evaluated the expression and function of miR-218 in human lung cancer and ALDH positive lung cancer cells to understand the potential mechanisms responsible for disease pathology. Also, the association between its host genes and the target genes could be useful towards the better understanding of prognosis in clinical settings. Publicly-available data from The Cancer Genome Atlas (TCGA) was mined to compare the levels of miR-218 and its host gene SLIT2/3 between lung cancer tissues and normal lung tissues. Transfection of miR-218 to investigate its function in lung cancer cells was done and in vivo effects were determined using miR-218 expressing lentiviruses. Aldefluor assay and Flow cytometry was used to quantify and enrich ALDH positive lung cancer cells. Levels of miR-218, IL-6R, JAK3 and phosphorylated STAT3 were compared in ALDH1A1 positive and ALDH1A1 negative cells. Overexpression of miR-218 in ALDH positive cells was carried to test the survival by tumorsphere culture. Finally, utilizing TCGA data we studied the association of target genes of miR-218 with the prognosis of lung cancer. We observed that the expression of miR-218 was significantly down-regulated in lung cancer tissues compared to normal lung tissues. Overexpression of miR-218 decreased cell proliferation, invasion, colony formation, and tumor sphere formation in vitro and repressed tumor growth in vivo. We further found that miR-218 negatively regulated IL-6 receptor and JAK3 gene expression by directly targeting the 3'-UTR of their mRNAs. In addition, the levels of both miR-218 host genes and the components of IL-6/STAT3 pathway correlated with prognosis of lung cancer patients. MiR-218 acts as a tumor suppressor in lung cancer via IL-6/STAT3 signaling pathway regulation.

  17. Warehousing re-annotated cancer genes for biomarker meta-analysis.

    PubMed

    Orsini, M; Travaglione, A; Capobianco, E

    2013-07-01

    Translational research in cancer genomics assigns a fundamental role to bioinformatics in support of candidate gene prioritization with regard to both biomarker discovery and target identification for drug development. Efforts in both such directions rely on the existence and constant update of large repositories of gene expression data and omics records obtained from a variety of experiments. Users who interactively interrogate such repositories may have problems in retrieving sample fields that present limited associated information, due for instance to incomplete entries or sometimes unusable files. Cancer-specific data sources present similar problems. Given that source integration usually improves data quality, one of the objectives is keeping the computational complexity sufficiently low to allow an optimal assimilation and mining of all the information. In particular, the scope of integrating intraomics data can be to improve the exploration of gene co-expression landscapes, while the scope of integrating interomics sources can be that of establishing genotype-phenotype associations. Both integrations are relevant to cancer biomarker meta-analysis, as the proposed study demonstrates. Our approach is based on re-annotating cancer-specific data available at the EBI's ArrayExpress repository and building a data warehouse aimed to biomarker discovery and validation studies. Cancer genes are organized by tissue with biomedical and clinical evidences combined to increase reproducibility and consistency of results. For better comparative evaluation, multiple queries have been designed to efficiently address all types of experiments and platforms, and allow for retrieval of sample-related information, such as cell line, disease state and clinical aspects. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  18. From 20th century metabolic wall charts to 21st century systems biology: database of mammalian metabolic enzymes.

    PubMed

    Corcoran, Callan C; Grady, Cameron R; Pisitkun, Trairak; Parulekar, Jaya; Knepper, Mark A

    2017-03-01

    The organization of the mammalian genome into gene subsets corresponding to specific functional classes has provided key tools for systems biology research. Here, we have created a web-accessible resource called the Mammalian Metabolic Enzyme Database ( https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymeDatabase.html) keyed to the biochemical reactions represented on iconic metabolic pathway wall charts created in the previous century. Overall, we have mapped 1,647 genes to these pathways, representing ~7 percent of the protein-coding genome. To illustrate the use of the database, we apply it to the area of kidney physiology. In so doing, we have created an additional database ( Database of Metabolic Enzymes in Kidney Tubule Segments: https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/), mapping mRNA abundance measurements (mined from RNA-Seq studies) for all metabolic enzymes to each of 14 renal tubule segments. We carry out bioinformatics analysis of the enzyme expression pattern among renal tubule segments and mine various data sources to identify vasopressin-regulated metabolic enzymes in the renal collecting duct. Copyright © 2017 the American Physiological Society.

  19. Bioinformatic detection of E47, E2F1 and SREBP1 transcription factors as potential regulators of genes associated to acquisition of endometrial receptivity

    PubMed Central

    2011-01-01

    Background The endometrium is a dynamic tissue whose changes are driven by the ovarian steroidal hormones. Its main function is to provide an adequate substrate for embryo implantation. Using microarray technology, several reports have provided the gene expression patterns of human endometrial tissue during the window of implantation. However it is required that biological connections be made across these genomic datasets to take full advantage of them. The objective of this work was to perform a research synthesis of available gene expression profiles related to acquisition of endometrial receptivity for embryo implantation, in order to gain insights into its molecular basis and regulation. Methods Gene expression datasets were intersected to determine a consensus endometrial receptivity transcript list (CERTL). For this cluster of genes we determined their functional annotations using available web-based databases. In addition, promoter sequences were analyzed to identify putative transcription factor binding sites using bioinformatics tools and determined over-represented features. Results We found 40 up- and 21 down-regulated transcripts in the CERTL. Those more consistently increased were C4BPA, SPP1, APOD, CD55, CFD, CLDN4, DKK1, ID4, IL15 and MAP3K5 whereas the more consistently decreased were OLFM1, CCNB1, CRABP2, EDN3, FGFR1, MSX1 and MSX2. Functional annotation of CERTL showed it was enriched with transcripts related to the immune response, complement activation and cell cycle regulation. Promoter sequence analysis of genes revealed that DNA binding sites for E47, E2F1 and SREBP1 transcription factors were the most consistently over-represented and in both up- and down-regulated genes during the window of implantation. Conclusions Our research synthesis allowed organizing and mining high throughput data to explore endometrial receptivity and focus future research efforts on specific genes and pathways. The discovery of possible new transcription factors orchestrating the CERTL opens new alternatives for understanding gene expression regulation in uterine function. PMID:21272326

  20. MAGIC database and interfaces: an integrated package for gene discovery and expression.

    PubMed

    Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H

    2004-01-01

    The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.

  1. Deciphering psoriasis. A bioinformatic approach.

    PubMed

    Melero, Juan L; Andrades, Sergi; Arola, Lluís; Romeu, Antoni

    2018-02-01

    Psoriasis is an immune-mediated, inflammatory and hyperproliferative disease of the skin and joints. The cause of psoriasis is still unknown. The fundamental feature of the disease is the hyperproliferation of keratinocytes and the recruitment of cells from the immune system in the region of the affected skin, which leads to deregulation of many well-known gene expressions. Based on data mining and bioinformatic scripting, here we show a new dimension of the effect of psoriasis at the genomic level. Using our own pipeline of scripts in Perl and MySql and based on the freely available NCBI Gene Expression Omnibus (GEO) database: DataSet Record GDS4602 (Series GSE13355), we explore the extent of the effect of psoriasis on gene expression in the affected tissue. We give greater insight into the effects of psoriasis on the up-regulation of some genes in the cell cycle (CCNB1, CCNA2, CCNE2, CDK1) or the dynamin system (GBPs, MXs, MFN1), as well as the down-regulation of typical antioxidant genes (catalase, CAT; superoxide dismutases, SOD1-3; and glutathione reductase, GSR). We also provide a complete list of the human genes and how they respond in a state of psoriasis. Our results show that psoriasis affects all chromosomes and many biological functions. If we further consider the stable and mitotically inheritable character of the psoriasis phenotype, and the influence of environmental factors, then it seems that psoriasis has an epigenetic origin. This fit well with the strong hereditary character of the disease as well as its complex genetic background. Copyright © 2017 Japanese Society for Investigative Dermatology. Published by Elsevier B.V. All rights reserved.

  2. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  3. Toolbox for Antibiotics Discovery from Microorganisms.

    PubMed

    Fisch, Katja M; Schäberle, Till F

    2016-09-01

    Microorganisms produce a vast array of biologically active metabolites. Such compounds are applied by humans to positively influence their health and, therefore, natural products serve as drug leads for pharmaceutical and medicinal chemistry. In this minireview, tools for the discovery and the production of potential drug leads are explained. A snapshot is provided, starting from the isolation of new producer strains, across genomic mining of (meta)genomes to identify biosynthetic gene clusters corresponding to natural products, toward heterologous expression to produce potential drug leads. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

    PubMed

    Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

    2018-08-01

    Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.

  5. The old 3-oxoadipate pathway revisited: new insights in the catabolism of aromatics in the saprophytic fungus Aspergillus nidulans.

    PubMed

    Martins, Tiago M; Hartmann, Diego O; Planchon, Sébastien; Martins, Isabel; Renaut, Jenny; Silva Pereira, Cristina

    2015-01-01

    Aspergilli play major roles in the natural turnover of elements, especially through the decomposition of plant litter, but the end catabolism of lignin aromatic hydrocarbons remains largely unresolved. The 3-oxoadipate pathway of their degradation combines the catechol and the protocatechuate branches, each using a set of specific genes. However, annotation for most of these genes is lacking or attributed to poorly- or un-characterised families. Aspergillus nidulans can utilise as sole carbon/energy source either benzoate or salicylate (upstream aromatic metabolites of the protocatechuate and the catechol branches, respectively). Using this cultivation strategy and combined analyses of comparative proteomics, gene mining, gene expression and characterisation of particular gene-replacement mutants, we precisely assigned most of the steps of the 3-oxoadipate pathway to specific genes in this fungus. Our findings disclose the genetically encoded potential of saprophytic Ascomycota fungi to utilise this pathway and provide means to untie associated regulatory networks, which are vital to heightening their ecological significance. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Elucidating mechanisms of toxic action of dissolved organic chemicals in oil sands process-affected water (OSPW).

    PubMed

    Morandi, Garrett D; Wiseman, Steve B; Guan, Miao; Zhang, Xiaowei W; Martin, Jonathan W; Giesy, John P

    2017-11-01

    Oil sands process-affected water (OSPW) is generated during extraction of bitumen in the surface-mining oil sands industry in Alberta, Canada, and is acutely and chronically toxic to aquatic organisms. It is known that dissolved organic compounds in OSPW are responsible for most toxic effects, but knowledge of the specific mechanism(s) of toxicity, is limited. Using bioassay-based effects-directed analysis, the dissolved organic fraction of OSPW has previously been fractionated, ultimately producing refined samples of dissolved organic chemicals in OSPW, each with distinct chemical profiles. Using the Escherichia coli K-12 strain MG1655 gene reporter live cell array, the present study investigated relationships between toxic potencies of each fraction, expression of genes and characterization of chemicals in each of five acutely toxic and one non-toxic extract of OSPW derived by use of effects-directed analysis. Effects on expressions of genes related to response to oxidative stress, protein stress and DNA damage were indicative of exposure to acutely toxic extracts of OSPW. Additionally, six genes were uniquely responsive to acutely toxic extracts of OSPW. Evidence presented supports a role for sulphur- and nitrogen-containing chemical classes in the toxicity of extracts of OSPW. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Diverse activities of viral cis-acting RNA regulatory elements revealed using multicolor, long-term, single-cell imaging

    PubMed Central

    Pocock, Ginger M.; Zimdars, Laraine L.; Yuan, Ming; Eliceiri, Kevin W.; Ahlquist, Paul; Sherer, Nathan M.

    2017-01-01

    Cis-acting RNA structural elements govern crucial aspects of viral gene expression. How these structures and other posttranscriptional signals affect RNA trafficking and translation in the context of single cells is poorly understood. Herein we describe a multicolor, long-term (>24 h) imaging strategy for measuring integrated aspects of viral RNA regulatory control in individual cells. We apply this strategy to demonstrate differential mRNA trafficking behaviors governed by RNA elements derived from three retroviruses (HIV-1, murine leukemia virus, and Mason-Pfizer monkey virus), two hepadnaviruses (hepatitis B virus and woodchuck hepatitis virus), and an intron-retaining transcript encoded by the cellular NXF1 gene. Striking behaviors include “burst” RNA nuclear export dynamics regulated by HIV-1’s Rev response element and the viral Rev protein; transient aggregations of RNAs into discrete foci at or near the nuclear membrane triggered by multiple elements; and a novel, pulsiform RNA export activity regulated by the hepadnaviral posttranscriptional regulatory element. We incorporate single-cell tracking and a data-mining algorithm into our approach to obtain RNA element–specific, high-resolution gene expression signatures. Together these imaging assays constitute a tractable, systems-based platform for studying otherwise difficult to access spatiotemporal features of viral and cellular gene regulation. PMID:27903772

  8. Multi-objective evolutionary optimization for constructing neural networks for virtual reality visual data mining: application to geophysical prospecting.

    PubMed

    Valdés, Julio J; Barton, Alan J

    2007-05-01

    A method for the construction of virtual reality spaces for visual data mining using multi-objective optimization with genetic algorithms on nonlinear discriminant (NDA) neural networks is presented. Two neural network layers (the output and the last hidden) are used for the construction of simultaneous solutions for: (i) a supervised classification of data patterns and (ii) an unsupervised similarity structure preservation between the original data matrix and its image in the new space. A set of spaces are constructed from selected solutions along the Pareto front. This strategy represents a conceptual improvement over spaces computed by single-objective optimization. In addition, genetic programming (in particular gene expression programming) is used for finding analytic representations of the complex mappings generating the spaces (a composition of NDA and orthogonal principal components). The presented approach is domain independent and is illustrated via application to the geophysical prospecting of caves.

  9. Identification and expression analysis of WRKY transcription factor genes in canola (Brassica napus L.) in response to fungal pathogens and hormone treatments.

    PubMed

    Yang, Bo; Jiang, Yuanqing; Rahman, Muhammad H; Deyholos, Michael K; Kav, Nat N V

    2009-06-03

    Members of plant WRKY transcription factor families are widely implicated in defense responses and various other physiological processes. For canola (Brassica napus L.), no WRKY genes have been described in detail. Because of the economic importance of this crop, and its evolutionary relationship to Arabidopsis thaliana, we sought to characterize a subset of canola WRKY genes in the context of pathogen and hormone responses. In this study, we identified 46 WRKY genes from canola by mining the expressed sequence tag (EST) database and cloned cDNA sequences of 38 BnWRKYs. A phylogenetic tree was constructed using the conserved WRKY domain amino acid sequences, which demonstrated that BnWRKYs can be divided into three major groups. We further compared BnWRKYs to the 72 WRKY genes from Arabidopsis and 91 WRKY from rice, and we identified 46 presumptive orthologs of AtWRKY genes. We examined the subcellular localization of four BnWRKY proteins using green fluorescent protein (GFP) and we observed the fluorescent green signals in the nucleus only.The responses of 16 selected BnWRKY genes to two fungal pathogens, Sclerotinia sclerotiorum and Alternaria brassicae, were analyzed by quantitative real time-PCR (qRT-PCR). Transcript abundance of 13 BnWRKY genes changed significantly following pathogen challenge: transcripts of 10 WRKYs increased in abundance, two WRKY transcripts decreased after infection, and one decreased at 12 h post-infection but increased later on (72 h). We also observed that transcript abundance of 13/16 BnWRKY genes was responsive to one or more hormones, including abscisic acid (ABA), and cytokinin (6-benzylaminopurine, BAP) and the defense signaling molecules jasmonic acid (JA), salicylic acid (SA), and ethylene (ET). We compared these transcript expression patterns to those previously described for presumptive orthologs of these genes in Arabidopsis and rice, and observed both similarities and differences in expression patterns. We identified a set of 13 BnWRKY genes from among 16 BnWRKY genes assayed, that are responsive to both fungal pathogens and hormone treatments, suggesting shared signaling mechanisms for these responses. This study suggests that a large number of BnWRKY proteins are involved in the transcriptional regulation of defense-related genes in response to fungal pathogens and hormone stimuli.

  10. Transposable elements in TDP-43-mediated neurodegenerative disorders.

    PubMed

    Li, Wanhe; Jin, Ying; Prazak, Lisa; Hammell, Molly; Dubnau, Josh

    2012-01-01

    Elevated expression of specific transposable elements (TEs) has been observed in several neurodegenerative disorders. TEs also can be active during normal neurogenesis. By mining a series of deep sequencing datasets of protein-RNA interactions and of gene expression profiles, we uncovered extensive binding of TE transcripts to TDP-43, an RNA-binding protein central to amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD). Second, we find that association between TDP-43 and many of its TE targets is reduced in FTLD patients. Third, we discovered that a large fraction of the TEs to which TDP-43 binds become de-repressed in mouse TDP-43 disease models. We propose the hypothesis that TE mis-regulation contributes to TDP-43 related neurodegenerative diseases.

  11. Developmental transcriptome analysis and identification of genes involved in formation of intestinal air-breathing function of Dojo loach, Misgurnus anguillicaudatus

    PubMed Central

    Luo, Weiwei; Cao, Xiaojuan; Xu, Xiuwen; Huang, Songqian; Liu, Chuanshu; Tomljanovic, Tea

    2016-01-01

    Dojo loach, Misgurnus anguillicaudatus is a freshwater fish species of the loach family Cobitidae, using its posterior intestine as an accessory air-breathing organ. Little is known about the molecular regulatory mechanisms in the formation of intestinal air-breathing function of M. anguillicaudatus. Here high-throughput sequencing of mRNAs was performed from six developmental stages of posterior intestine of M. anguillicaudatus: 4-Dph (days post hatch) group, 8-Dph group, 12-Dph group, 20-Dph group, 40-Dph group and Oyd (one-year-old) group. These six libraries were assembled into 81300 unigenes. Totally 40757 unigenes were annotated. Subsequently, 35291 differentially expressed genes (DEGs) were scanned among different developmental stages and clustered into 20 gene expression profiles. Finally, 15 key pathways and 25 key genes were mined, providing potential targets for candidate gene selection involved in formation of intestinal air-breathing function in M. anguillicaudatus. This is the first report of developmental transcriptome of posterior intestine in M. anguillicaudatus, offering a substantial contribution to the sequence resources for this species and providing a deep insight into the formation mechanism of its intestinal air-breathing function. This report demonstrates that M. anguillicaudatus is a good model for studies to identify and characterize the molecular basis of accessory air-breathing organ development in fish. PMID:27545457

  12. Genetic networks and soft computing.

    PubMed

    Mitra, Sushmita; Das, Ranajit; Hayashi, Yoichi

    2011-01-01

    The analysis of gene regulatory networks provides enormous information on various fundamental cellular processes involving growth, development, hormone secretion, and cellular communication. Their extraction from available gene expression profiles is a challenging problem. Such reverse engineering of genetic networks offers insight into cellular activity toward prediction of adverse effects of new drugs or possible identification of new drug targets. Tasks such as classification, clustering, and feature selection enable efficient mining of knowledge about gene interactions in the form of networks. It is known that biological data is prone to different kinds of noise and ambiguity. Soft computing tools, such as fuzzy sets, evolutionary strategies, and neurocomputing, have been found to be helpful in providing low-cost, acceptable solutions in the presence of various types of uncertainties. In this paper, we survey the role of these soft methodologies and their hybridizations, for the purpose of generating genetic networks.

  13. Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction.

    PubMed

    Wang, Xia; Xu, Yuantao; Zhang, Siqi; Cao, Li; Huang, Yue; Cheng, Junfeng; Wu, Guizhi; Tian, Shilin; Chen, Chunli; Liu, Yan; Yu, Huiwen; Yang, Xiaoming; Lan, Hong; Wang, Nan; Wang, Lun; Xu, Jidi; Jiang, Xiaolin; Xie, Zongzhou; Tan, Meilian; Larkin, Robert M; Chen, Ling-Ling; Ma, Bin-Guang; Ruan, Yijun; Deng, Xiuxin; Xu, Qiang

    2017-05-01

    The emergence of apomixis-the transition from sexual to asexual reproduction-is a prominent feature of modern citrus. Here we de novo sequenced and comprehensively studied the genomes of four representative citrus species. Additionally, we sequenced 100 accessions of primitive, wild and cultivated citrus. Comparative population analysis suggested that genomic regions harboring energy- and reproduction-associated genes are probably under selection in cultivated citrus. We also narrowed the genetic locus responsible for citrus polyembryony, a form of apomixis, to an 80-kb region containing 11 candidate genes. One of these, CitRWP, is expressed at higher levels in ovules of polyembryonic cultivars. We found a miniature inverted-repeat transposable element insertion in the promoter region of CitRWP that cosegregated with polyembryony. This study provides new insights into citrus apomixis and constitutes a promising resource for the mining of agriculturally important genes.

  14. Informed walks: whispering hints to gene hunters inside networks' jungle.

    PubMed

    Bourdakou, Marilena M; Spyrou, George M

    2017-10-11

    Systemic approaches offer a different point of view on the analysis of several types of molecular associations as well as on the identification of specific gene communities in several cancer types. However, due to lack of sufficient data needed to construct networks based on experimental evidence, statistical gene co-expression networks are widely used instead. Many efforts have been made to exploit the information hidden in these networks. However, these approaches still need to capitalize comprehensively the prior knowledge encrypted into molecular pathway associations and improve their efficiency regarding the discovery of both exclusive subnetworks as candidate biomarkers and conserved subnetworks that may uncover common origins of several cancer types. In this study we present the development of the Informed Walks model based on random walks that incorporate information from molecular pathways to mine candidate genes and gene-gene links. The proposed model has been applied to TCGA (The Cancer Genome Atlas) datasets from seven different cancer types, exploring the reconstructed co-expression networks of the whole set of genes and driving to highlighted sub-networks for each cancer type. In the sequel, we elucidated the impact of each subnetwork on the indication of underlying exclusive and common molecular mechanisms as well as on the short-listing of drugs that have the potential to suppress the corresponding cancer type through a drug-repurposing pipeline. We have developed a method of gene subnetwork highlighting based on prior knowledge, capable to give fruitful insights regarding the underlying molecular mechanisms and valuable input to drug-repurposing pipelines for a variety of cancer types.

  15. Unsupervised text mining for assessing and augmenting GWAS results.

    PubMed

    Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

    2016-04-01

    Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

    PubMed

    Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

    2016-04-26

    Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.

  17. Activity-dependent neuroprotective protein (ADNP): a case study for highly conserved chordata-specific genes shaping the brain and mutated in cancer.

    PubMed

    Gozes, Illana; Yeheskel, Adva; Pasmanik-Chor, Metsada

    2015-01-01

    The recent finding of activity-dependent neuroprotective protein (ADNP) as a protein decreased in serum of patients with Alzheimer's disease (AD) compared to controls, alongside with the discovery of ADNP mutations in autism and coupled with the original description of cancer mutations, ignited an interest for a comparative analysis of ADNP with other AD/autism/cancer-associated genes. We strive toward a better understanding of the molecular structure of key players in psychiatric/neurodegenerative diseases including autism, schizophrenia, and AD. This article includes data mining and bioinformatics analysis on the ADNP gene and protein, in addition to other related genes, with emphasis on recent literature. ADNP is discovered here as unique to chordata with specific autism mutations different from cancer-associated mutation. Furthermore, ADNP exhibits similarities to other cancer/autism-associated genes. We suggest that key genes, which shape and maintain our brain and are prone to mutations, are by in large unique to chordata. Furthermore, these brain-controlling genes, like ADNP, are linked to cell growth and differentiation, and under different stress conditions may mutate or exhibit expression changes leading to cancer propagation. Better understanding of these genes could lead to better therapeutics.

  18. The genomic response of skeletal muscle to methylprednisolone using microarrays: tailoring data mining to the structure of the pharmacogenomic time series

    PubMed Central

    DuBois, Debra C; Piel, William H; Jusko, William J

    2008-01-01

    High-throughput data collection using gene microarrays has great potential as a method for addressing the pharmacogenomics of complex biological systems. Similarly, mechanism-based pharmacokinetic/pharmacodynamic modeling provides a tool for formulating quantitative testable hypotheses concerning the responses of complex biological systems. As the response of such systems to drugs generally entails cascades of molecular events in time, a time series design provides the best approach to capturing the full scope of drug effects. A major problem in using microarrays for high-throughput data collection is sorting through the massive amount of data in order to identify probe sets and genes of interest. Due to its inherent redundancy, a rich time series containing many time points and multiple samples per time point allows for the use of less stringent criteria of expression, expression change and data quality for initial filtering of unwanted probe sets. The remaining probe sets can then become the focus of more intense scrutiny by other methods, including temporal clustering, functional clustering and pharmacokinetic/pharmacodynamic modeling, which provide additional ways of identifying the probes and genes of pharmacological interest. PMID:15212590

  19. Data-Driven Discovery of Extravasation Pathway in Circulating Tumor Cells

    PubMed Central

    Yadavalli, S.; Jayaram, S.; Manda, S. S.; Madugundu, A. K.; Nayakanti, D. S.; Tan, T. Z.; Bhat, R.; Rangarajan, A.; Chatterjee, A.; Gowda, H.; Thiery, J. P.; Kumar, P.

    2017-01-01

    Circulating tumor cells (CTCs) play a crucial role in cancer dissemination and provide a promising source of blood-based markers. Understanding the spectrum of transcriptional profiles of CTCs and their corresponding regulatory mechanisms will allow for a more robust analysis of CTC phenotypes. The current challenge in CTC research is the acquisition of useful clinical information from the multitude of high-throughput studies. To gain a deeper understanding of CTC heterogeneity and identify genes, pathways and processes that are consistently affected across tumors, we mined the literature for gene expression profiles in CTCs. Through in silico analysis and the integration of CTC-specific genes, we found highly significant biological mechanisms and regulatory processes acting in CTCs across various cancers, with a particular enrichment of the leukocyte extravasation pathway. This pathway appears to play a pivotal role in the migration of CTCs to distant metastatic sites. We find that CTCs from multiple cancers express both epithelial and mesenchymal markers in varying amounts, which is suggestive of dynamic and hybrid states along the epithelial-mesenchymal transition (EMT) spectrum. Targeting the specific molecular nodes to monitor disease and therapeutic control of CTCs in real time will likely improve the clinical management of cancer progression and metastases. PMID:28262832

  20. Direct Capture Technologies for Genomics-Guided Discovery of Natural Products.

    PubMed

    Chan, Andrew N; Santa Maria, Kevin C; Li, Bo

    2016-01-01

    Microbes are important producers of natural products, which have played key roles in understanding biology and treating disease. However, the full potential of microbes to produce natural products has yet to be realized; the overwhelming majority of natural product gene clusters encoded in microbial genomes remain "cryptic", and have not been expressed or characterized. In contrast to the fast-growing number of genomic sequences and bioinformatic tools, methods to connect these genes to natural product molecules are still limited, creating a bottleneck in genome-mining efforts to discover novel natural products. Here we review developing technologies that leverage the power of homologous recombination to directly capture natural product gene clusters and express them in model hosts for isolation and structural characterization. Although direct capture is still in its early stages of development, it has been successfully utilized in several different classes of natural products. These early successes will be reviewed, and the methods will be compared and contrasted with existing traditional technologies. Lastly, we will discuss the opportunities for the development of direct capture in other organisms, and possibilities to integrate direct capture with emerging genome-editing techniques to accelerate future study of natural products.

  1. An integrated and comparative approach towards identification, characterization and functional annotation of candidate genes for drought tolerance in sorghum (Sorghum bicolor (L.) Moench).

    PubMed

    Woldesemayat, Adugna Abdi; Van Heusden, Peter; Ndimba, Bongani K; Christoffels, Alan

    2017-12-22

    Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data. We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the interplay of biochemical reactions that make up the metabolic network, constituting fundamental interface for sorghum defence mechanism against drought stress. This study suggests untapped natural variability in sorghum that could be used for developing drought tolerance. The data presented here, may be regarded as an initial reference point in functional and comparative genomics in the Gramineae family.

  2. Identification and evaluation of cyp1a transcript expression in fish as molecular biomarker for petroleum contamination in tropical fresh water ecosystems.

    PubMed

    dos Anjos, Nislanha Ana; Schulze, Tobias; Brack, Werner; Val, Adalberto Luis; Schirmer, Kristin; Scholz, Stefan

    2011-05-01

    In order to monitor potential contamination deriving from exploration and transport of oil in the Urucu region (Brazil), there is a need to establish suitable biomarkers for native Amazonian fish. Therefore, the transcript expression of various potentially sensitive genes (ahr2(1), cyp1a, hmox1, hsp70, maft, mt, nfe212, gstp1 and nqo1) in fish exposed to water soluble fractions of oil (WSF) was compared. The analysis was first performed in an established laboratory model, the zebrafish embryo. The cyp1a gene proved to be the most sensitive and robust marker for oil contamination and, hence, was selected to study the effect of oil-derived contaminants in the Amazonian cichlid Astronotus ocellatus. Induction of cyp1a transcript expression was observed for ≥0.0061% (v/v) WSFs. In liver samples of fish, collected from different lakes in the Urucu oil mining area, no elevated expression of cyp1a transcripts was observed. The data demonstrate the high sensitivity of cyp1a as indicator of oil exposure; further studies should be considered to test its usefulness at known contaminated sites and to evaluate influential factors by, e.g. mesocosm experiments. Copyright © 2011 Elsevier B.V. All rights reserved.

  3. TCF21 is related to testis growth and development in broiler chickens.

    PubMed

    Zhang, Hui; Na, Wei; Zhang, Hong-Li; Wang, Ning; Du, Zhi-Qiang; Wang, Shou-Zhi; Wang, Zhi-Peng; Zhang, Zhiwu; Li, Hui

    2017-02-24

    Large amounts of fat deposition often lead to loss of reproductive efficiency in humans and animals. We used broiler chickens as a model species to conduct a two-directional selection for and against abdominal fat over 19 generations, which resulted in a lean and a fat line. Direct selection for abdominal fat content also indirectly resulted in significant differences (P < 0.05) in testis weight (TeW) and in TeW as a percentage of total body weight (TeP) between the lean and fat lines. A total of 475 individuals from the generation 11 (G 11 ) were genotyped. Genome-wide association studies revealed two regions on chicken chromosomes 3 and 10 that were associated with TeW and TeP. Forty G 16 individuals (20 from each line), were further profiled by focusing on these two chromosomal regions, to identify candidate genes with functions that may be potentially related to testis growth and development. Of the nine candidate genes identified with database mining, a significant association was confirmed for one gene, TCF21, based on mRNA expression analysis. Gene expression analysis of the TCF21 gene was conducted again across 30 G 19 individuals (15 individuals from each line) and the results confirmed the findings on the G 16 animals. This study revealed that the TCF21 gene is related to testis growth and development in male broilers. This finding will be useful to guide future studies to understand the genetic mechanisms that underlie reproductive efficiency.

  4. Coordinated regulation of IFITM1, 2 and 3 genes by an IFN-responsive enhancer through long-range chromatin interactions.

    PubMed

    Li, Ping; Shi, Ming-Lei; Shen, Wen-Long; Zhang, Zhang; Xie, De-Jian; Zhang, Xiang-Yuan; He, Chao; Zhang, Yan; Zhao, Zhi-Hu

    2017-08-01

    Interferon-induced transmembrane protein (IFITM) 1, 2 and 3 genes encode a family of interferon (IFN)-induced transmembrane proteins that block entry of a broad spectrum of pathogens. However, the transcriptional regulation of these genes, especially whether there exist any enhancers and their roles during the IFN induction process remain elusive. Here, through public data mining, episomal luciferase reporter assay and in vivo CRISPR-Cas9 genome editing, we identified an IFN-responsive enhancer located 35kb upstream of IFITM3 gene promoter upregulating the IFN-induced expression of IFITM1, 2 and 3 genes. Chromatin immunoprecipitation (ChIP), electrophoretic mobility shift assay (EMSA) and luciferase reporter assay demonstrated that signal transducers and activators of transcription (STAT) 1 bound to the enhancer with the treatment of IFN and was indispensable for the enhancer activity. Furthermore, using chromosome conformation capture technique, we revealed that the IFITM1, 2 and 3 genes physically clustered together and constitutively looped to the distal enhancer through long-range interactions in both HEK293 and A549 cells, providing structural basis for coordinated regulation of IFITM1, 2 and 3 by the enhancer. Finally, we showed that in vivo truncation of the enhancer impaired IFN-induced resistance to influenza A virus (IAV) infection. These findings expand our understanding of the mechanisms underlying the transcriptional regulation of IFITM1, 2 and 3 expression and its ability to mediate IFN signaling. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Discovery of Gene Cluster for Mycosporine-Like Amino Acid Biosynthesis from Actinomycetales Microorganisms and Production of a Novel Mycosporine-Like Amino Acid by Heterologous Expression

    PubMed Central

    Miyamoto, Kiyoko T.; Komatsu, Mamoru

    2014-01-01

    Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. PMID:24907338

  6. Discovery of gene cluster for mycosporine-like amino acid biosynthesis from Actinomycetales microorganisms and production of a novel mycosporine-like amino acid by heterologous expression.

    PubMed

    Miyamoto, Kiyoko T; Komatsu, Mamoru; Ikeda, Haruo

    2014-08-01

    Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  7. Use of transcriptome sequencing to understand the pistillate flowering in hickory (Carya cathayensis Sarg.).

    PubMed

    Huang, You-Jun; Liu, Li-Li; Huang, Jian-Qin; Wang, Zheng-Jia; Chen, Fang-Fang; Zhang, Qi-Xiang; Zheng, Bing-Song; Chen, Ming

    2013-10-10

    Different from herbaceous plants, the woody plants undergo a long-period vegetative stage to achieve floral transition. They then turn into seasonal plants, flowering annually. In this study, a preliminary model of gene regulations for seasonal pistillate flowering in hickory (Carya cathayensis) was proposed. The genome-wide dynamic transcriptome was characterized via the joint-approach of RNA sequencing and microarray analysis. Differential transcript abundance analysis uncovered the dynamic transcript abundance patterns of flowering correlated genes and their major functions based on Gene Ontology (GO) analysis. To explore pistillate flowering mechanism in hickory, a comprehensive flowering gene regulatory network based on Arabidopsis thaliana was constructed by additional literature mining. A total of 114 putative flowering or floral genes including 31 with differential transcript abundance were identified in hickory. The locations, functions and dynamic transcript abundances were analyzed in the gene regulatory networks. A genome-wide co-expression network for the putative flowering or floral genes shows three flowering regulatory modules corresponding to response to light abiotic stimulus, cold stress, and reproductive development process, respectively. Totally 27 potential flowering or floral genes were recruited which are meaningful to understand the hickory specific seasonal flowering mechanism better. Flowering event of pistillate flower bud in hickory is triggered by several pathways synchronously including the photoperiod, autonomous, vernalization, gibberellin, and sucrose pathway. Totally 27 potential flowering or floral genes were recruited from the genome-wide co-expression network function module analysis. Moreover, the analysis provides a potential FLC-like gene based vernalization pathway and an 'AC' model for pistillate flower development in hickory. This work provides an available framework for pistillate flower development in hickory, which is significant for insight into regulation of flowering and floral development of woody plants.

  8. Use of transcriptome sequencing to understand the pistillate flowering in hickory (Carya cathayensis Sarg.)

    PubMed Central

    2013-01-01

    Background Different from herbaceous plants, the woody plants undergo a long-period vegetative stage to achieve floral transition. They then turn into seasonal plants, flowering annually. In this study, a preliminary model of gene regulations for seasonal pistillate flowering in hickory (Carya cathayensis) was proposed. The genome-wide dynamic transcriptome was characterized via the joint-approach of RNA sequencing and microarray analysis. Results Differential transcript abundance analysis uncovered the dynamic transcript abundance patterns of flowering correlated genes and their major functions based on Gene Ontology (GO) analysis. To explore pistillate flowering mechanism in hickory, a comprehensive flowering gene regulatory network based on Arabidopsis thaliana was constructed by additional literature mining. A total of 114 putative flowering or floral genes including 31 with differential transcript abundance were identified in hickory. The locations, functions and dynamic transcript abundances were analyzed in the gene regulatory networks. A genome-wide co-expression network for the putative flowering or floral genes shows three flowering regulatory modules corresponding to response to light abiotic stimulus, cold stress, and reproductive development process, respectively. Totally 27 potential flowering or floral genes were recruited which are meaningful to understand the hickory specific seasonal flowering mechanism better. Conclusions Flowering event of pistillate flower bud in hickory is triggered by several pathways synchronously including the photoperiod, autonomous, vernalization, gibberellin, and sucrose pathway. Totally 27 potential flowering or floral genes were recruited from the genome-wide co-expression network function module analysis. Moreover, the analysis provides a potential FLC-like gene based vernalization pathway and an 'AC’ model for pistillate flower development in hickory. This work provides an available framework for pistillate flower development in hickory, which is significant for insight into regulation of flowering and floral development of woody plants. PMID:24106755

  9. Selection on oxidative phosphorylation and ribosomal structure as a multigenerational response to ocean acidification in the common copepod Pseudocalanus acuspes.

    PubMed

    De Wit, Pierre; Dupont, Sam; Thor, Peter

    2016-10-01

    Ocean acidification is expected to have dramatic impacts on oceanic ecosystems, yet surprisingly few studies currently examine long-term adaptive and plastic responses of marine invertebrates to p CO 2 stress. Here, we exposed populations of the common copepod Pseudocalanus acuspes to three p CO 2 regimes (400, 900, and 1550 μatm) for two generations, after which we conducted a reciprocal transplant experiment. A de novo transcriptome was assembled, annotated, and gene expression data revealed that genes involved in RNA transcription were strongly down-regulated in populations with long-term exposure to a high p CO 2 environment, even after transplantation back to control levels. In addition, 747 000 SNPs were identified, out of which 1513 showed consistent changes in nucleotide frequency between replicates of control and high p CO 2 populations. Functions involving RNA transcription and ribosomal function, as well as ion transport and oxidative phosphorylation, were highly overrepresented. We thus conclude that p CO 2 stress appears to impose selection in copepods on RNA synthesis and translation, possibly modulated by helicase expression. Using a physiological hypothesis-testing strategy to mine gene expression data, we herein increase the power to detect cellular targets of ocean acidification. This novel approach seems promising for future studies of effects of environmental changes in ecologically important nonmodel organisms.

  10. Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.

    PubMed

    Roche, Kimberly E; Weinstein, Marvin; Dunwoodie, Leland J; Poehlman, William L; Feltus, Frank A

    2018-05-25

    We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.

  11. Isoprenoid-Based Biofuels: Homologous Expression and Heterologous Expression in Prokaryotes.

    PubMed

    Phulara, Suresh Chandra; Chaturvedi, Preeti; Gupta, Pratima

    2016-10-01

    Enthusiasm for mining advanced biofuels from microbial hosts has increased remarkably in recent years. Isoprenoids are one of the highly diverse groups of secondary metabolites and are foreseen as an alternative to petroleum-based fuels. Most of the prokaryotes synthesize their isoprenoid backbone via the deoxyxylulose-5-phosphate pathway from glyceraldehyde-3-phosphate and pyruvate, whereas eukaryotes synthesize isoprenoids via the mevalonate pathway from acetyl coenzyme A (acetyl-CoA). Microorganisms do not accumulate isoprenoids in large quantities naturally, which restricts their application for fuel purposes. Various metabolic engineering efforts have been utilized to overcome the limitations associated with their natural and nonnatural production. The introduction of heterologous pathways/genes and overexpression of endogenous/homologous genes have shown a remarkable increase in isoprenoid yield and substrate utilization in microbial hosts. Such modifications in the hosts' genomes have enabled researchers to develop commercially competent microbial strains for isoprenoid-based biofuel production utilizing a vast array of substrates. The present minireview briefly discusses the recent advancement in metabolic engineering efforts in prokaryotic hosts for the production of isoprenoid-based biofuels, with an emphasis on endogenous, homologous, and heterologous expression strategies. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  12. Isoprenoid-Based Biofuels: Homologous Expression and Heterologous Expression in Prokaryotes

    PubMed Central

    Phulara, Suresh Chandra; Chaturvedi, Preeti

    2016-01-01

    Enthusiasm for mining advanced biofuels from microbial hosts has increased remarkably in recent years. Isoprenoids are one of the highly diverse groups of secondary metabolites and are foreseen as an alternative to petroleum-based fuels. Most of the prokaryotes synthesize their isoprenoid backbone via the deoxyxylulose-5-phosphate pathway from glyceraldehyde-3-phosphate and pyruvate, whereas eukaryotes synthesize isoprenoids via the mevalonate pathway from acetyl coenzyme A (acetyl-CoA). Microorganisms do not accumulate isoprenoids in large quantities naturally, which restricts their application for fuel purposes. Various metabolic engineering efforts have been utilized to overcome the limitations associated with their natural and nonnatural production. The introduction of heterologous pathways/genes and overexpression of endogenous/homologous genes have shown a remarkable increase in isoprenoid yield and substrate utilization in microbial hosts. Such modifications in the hosts' genomes have enabled researchers to develop commercially competent microbial strains for isoprenoid-based biofuel production utilizing a vast array of substrates. The present minireview briefly discusses the recent advancement in metabolic engineering efforts in prokaryotic hosts for the production of isoprenoid-based biofuels, with an emphasis on endogenous, homologous, and heterologous expression strategies. PMID:27422837

  13. Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

    PubMed

    Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

    2017-08-01

    A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Interleukin-27 is a novel candidate diagnostic biomarker for bacterial infection in critically ill children.

    PubMed

    Wong, Hector R; Cvijanovich, Natalie Z; Hall, Mark; Allen, Geoffrey L; Thomas, Neal J; Freishtat, Robert J; Anas, Nick; Meyer, Keith; Checchia, Paul A; Lin, Richard; Bigham, Michael T; Sen, Anita; Nowak, Jeffrey; Quasney, Michael; Henricksen, Jared W; Chopra, Arun; Banschbach, Sharon; Beckman, Eileen; Harmon, Kelli; Lahni, Patrick; Shanley, Thomas P

    2012-10-29

    Differentiating between sterile inflammation and bacterial infection in critically ill patients with fever and other signs of the systemic inflammatory response syndrome (SIRS) remains a clinical challenge. The objective of our study was to mine an existing genome-wide expression database for the discovery of candidate diagnostic biomarkers to predict the presence of bacterial infection in critically ill children. Genome-wide expression data were compared between patients with SIRS having negative bacterial cultures (n = 21) and patients with sepsis having positive bacterial cultures (n = 60). Differentially expressed genes were subjected to a leave-one-out cross-validation (LOOCV) procedure to predict SIRS or sepsis classes. Serum concentrations of interleukin-27 (IL-27) and procalcitonin (PCT) were compared between 101 patients with SIRS and 130 patients with sepsis. All data represent the first 24 hours of meeting criteria for either SIRS or sepsis. Two hundred twenty one gene probes were differentially regulated between patients with SIRS and patients with sepsis. The LOOCV procedure correctly predicted 86% of the SIRS and sepsis classes, and Epstein-Barr virus-induced gene 3 (EBI3) had the highest predictive strength. Computer-assisted image analyses of gene-expression mosaics were able to predict infection with a specificity of 90% and a positive predictive value of 94%. Because EBI3 is a subunit of the heterodimeric cytokine, IL-27, we tested the ability of serum IL-27 protein concentrations to predict infection. At a cut-point value of ≥5 ng/ml, serum IL-27 protein concentrations predicted infection with a specificity and a positive predictive value of >90%, and the overall performance of IL-27 was generally better than that of PCT. A decision tree combining IL-27 and PCT improved overall predictive capacity compared with that of either biomarker alone. Genome-wide expression analysis has provided the foundation for the identification of IL-27 as a novel candidate diagnostic biomarker for predicting bacterial infection in critically ill children. Additional studies will be required to test further the diagnostic performance of IL-27. The microarray data reported in this article have been deposited in the Gene Expression Omnibus under accession number GSE4607.

  15. Genomic and transcriptomic analyses reveal adaptation mechanisms of an Acidithiobacillus ferrivorans strain YL15 to alpine acid mine drainage.

    PubMed

    Peng, Tangjian; Ma, Liyuan; Feng, Xue; Tao, Jiemeng; Nan, Meihua; Liu, Yuandong; Li, Jiaokun; Shen, Li; Wu, Xueling; Yu, Runlan; Liu, Xueduan; Qiu, Guanzhou; Zeng, Weimin

    2017-01-01

    Acidithiobacillus ferrivorans is an acidophile that often occurs in low temperature acid mine drainage, e.g., that located at high altitude. Being able to inhabit the extreme environment, the bacterium must possess strategies to copy with the survival stress. Nonetheless, information on the strategies is in demand. Here, genomic and transcriptomic assays were performed to illuminate the adaptation mechanisms of an A. ferrivorans strain YL15, to the alpine acid mine drainage environment in Yulong copper mine in southwest China. Genomic analysis revealed that strain has a gene repertoire for metal-resistance, e.g., genes coding for the mer operon and a variety of transporters/efflux proteins, and for low pH adaptation, such as genes for hopanoid-synthesis and the sodium:proton antiporter. Genes for various DNA repair enzymes and synthesis of UV-absorbing mycosporine-like amino acids precursor indicated hypothetical UV radiation-resistance mechanisms in strain YL15. In addition, it has two types of the acquired immune system-type III-B and type I-F CRISPR/Cas modules against invasion of foreign genetic elements. RNA-seq based analysis uncovered that strain YL15 uses a set of mechanisms to adapt to low temperature. Genes involved in protein synthesis, transmembrane transport, energy metabolism and chemotaxis showed increased levels of RNA transcripts. Furthermore, a bacterioferritin Dps gene had higher RNA transcript counts at 6°C, possibly implicated in protecting DNA against oxidative stress at low temperature. The study represents the first to comprehensively unveil the adaptation mechanisms of an acidophilic bacterium to the acid mine drainage in alpine regions.

  16. Genomic and transcriptomic analyses reveal adaptation mechanisms of an Acidithiobacillus ferrivorans strain YL15 to alpine acid mine drainage

    PubMed Central

    Ma, Liyuan; Feng, Xue; Tao, Jiemeng; Nan, Meihua; Liu, Yuandong; Li, Jiaokun; Shen, Li; Wu, Xueling; Yu, Runlan; Liu, Xueduan; Qiu, Guanzhou; Zeng, Weimin

    2017-01-01

    Acidithiobacillus ferrivorans is an acidophile that often occurs in low temperature acid mine drainage, e.g., that located at high altitude. Being able to inhabit the extreme environment, the bacterium must possess strategies to copy with the survival stress. Nonetheless, information on the strategies is in demand. Here, genomic and transcriptomic assays were performed to illuminate the adaptation mechanisms of an A. ferrivorans strain YL15, to the alpine acid mine drainage environment in Yulong copper mine in southwest China. Genomic analysis revealed that strain has a gene repertoire for metal-resistance, e.g., genes coding for the mer operon and a variety of transporters/efflux proteins, and for low pH adaptation, such as genes for hopanoid-synthesis and the sodium:proton antiporter. Genes for various DNA repair enzymes and synthesis of UV-absorbing mycosporine-like amino acids precursor indicated hypothetical UV radiation—resistance mechanisms in strain YL15. In addition, it has two types of the acquired immune system–type III-B and type I-F CRISPR/Cas modules against invasion of foreign genetic elements. RNA-seq based analysis uncovered that strain YL15 uses a set of mechanisms to adapt to low temperature. Genes involved in protein synthesis, transmembrane transport, energy metabolism and chemotaxis showed increased levels of RNA transcripts. Furthermore, a bacterioferritin Dps gene had higher RNA transcript counts at 6°C, possibly implicated in protecting DNA against oxidative stress at low temperature. The study represents the first to comprehensively unveil the adaptation mechanisms of an acidophilic bacterium to the acid mine drainage in alpine regions. PMID:28542527

  17. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    PubMed

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  18. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    PubMed Central

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  19. Multi-Dimensional Prioritization of Dental Caries Candidate Genes and Its Enriched Dense Network Modules

    PubMed Central

    Wang, Quan; Jia, Peilin; Cuenco, Karen T.; Feingold, Eleanor; Marazita, Mary L.; Wang, Lily; Zhao, Zhongming

    2013-01-01

    A number of genetic studies have suggested numerous susceptibility genes for dental caries over the past decade with few definite conclusions. The rapid accumulation of relevant information, along with the complex architecture of the disease, provides a challenging but also unique opportunity to review and integrate the heterogeneous data for follow-up validation and exploration. In this study, we collected and curated candidate genes from four major categories: association studies, linkage scans, gene expression analyses, and literature mining. Candidate genes were prioritized according to the magnitude of evidence related to dental caries. We then searched for dense modules enriched with the prioritized candidate genes through their protein-protein interactions (PPIs). We identified 23 modules comprising of 53 genes. Functional analyses of these 53 genes revealed three major clusters: cytokine network relevant genes, matrix metalloproteinases (MMPs) family, and transforming growth factor-beta (TGF-β) family, all of which have been previously implicated to play important roles in tooth development and carious lesions. Through our extensive data collection and an integrative application of gene prioritization and PPI network analyses, we built a dental caries-specific sub-network for the first time. Our study provided insights into the molecular mechanisms underlying dental caries. The framework we proposed in this work can be applied to other complex diseases. PMID:24146904

  20. A massive incorporation of microbial genes into the genome of Tetranychus urticae, a polyphagous arthropod herbivore.

    PubMed

    Wybouw, N; Van Leeuwen, T; Dermauw, W

    2018-06-01

    A number of horizontal gene transfers (HGTs) have been identified in the spider mite Tetranychus urticae, a chelicerate herbivore. However, the genome of this mite species has at present not been thoroughly mined for the presence of HGT genes. Here, we performed a systematic screen for HGT genes in the T. urticae genome using the h-index metric. Our results not only validated previously identified HGT genes but also uncovered 25 novel HGT genes. In addition to HGT genes with a predicted biochemical function in carbohydrate, lipid and folate metabolism, we also identified the horizontal transfer of a ketopantoate hydroxymethyltransferase and a pantoate β-alanine ligase gene. In plants and bacteria, both genes are essential for vitamin B5 biosynthesis and their presence in the mite genome strongly suggests that spider mites, similar to Bemisia tabaci and nematodes, can synthesize their own vitamin B5. We further show that HGT genes were physically embedded within the mite genome and were expressed in different life stages. By screening chelicerate genomes and transcriptomes, we were able to estimate the evolutionary histories of these HGTs during chelicerate evolution. Our study suggests that HGT has made a significant and underestimated impact on the metabolic repertoire of plant-feeding spider mites. © 2018 The Royal Entomological Society.

  1. miRWalk--database: prediction of possible miRNA binding sites by "walking" the genes of three genomes.

    PubMed

    Dweep, Harsh; Sticht, Carsten; Pandey, Priyanka; Gretz, Norbert

    2011-10-01

    MicroRNAs are small, non-coding RNA molecules that can complementarily bind to the mRNA 3'-UTR region to regulate the gene expression by transcriptional repression or induction of mRNA degradation. Increasing evidence suggests a new mechanism by which miRNAs may regulate target gene expression by binding in promoter and amino acid coding regions. Most of the existing databases on miRNAs are restricted to mRNA 3'-UTR region. To address this issue, we present miRWalk, a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat. All mRNAs, mitochondrial genes and 10 kb upstream flanking regions of all known genes of human, mouse and rat were analyzed by using a newly developed algorithm named 'miRWalk' as well as with eight already established programs for putative miRNA binding sites. An automated and extensive text-mining search was performed on PubMed database to extract validated information on miRNAs. Combined information was put into a MySQL database. miRWalk presents predicted and validated information on miRNA-target interaction. Such a resource enables researchers to validate new targets of miRNA not only on 3'-UTR, but also on the other regions of all known genes. The 'Validated Target module' is updated every month and the 'Predicted Target module' is updated every 6 months. miRWalk is freely available at http://mirwalk.uni-hd.de/. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Comparative genomics of duplicate γ-glutamyl transferase genes in teleosts: medaka (Oryzias latipes), stickleback (Gasterosteus aculeatus), green spotted pufferfish (Tetraodon nigroviridis), fugu (Takifugu rubripes), and zebrafish (Danio rerio).

    PubMed

    Law, Sheran Hiu Wan; Redelings, Benjamin David; Kullman, Seth William

    2012-01-15

    The availability of multiple teleost (bony fish) genomes is providing unprecedented opportunities to understand the diversity and function of gene duplication events using comparative genomics. Here we examine multiple paralogous genes of γ-glutamyl transferase (GGT) in several distantly related teleost species including medaka, stickleback, green spotted pufferfish, fugu, and zebrafish. Through mining genome databases, we have identified multiple GGT orthologs. Duplicate (paralogous) GGT sequences for GGT1 (GGT1 a and b), GGTL1 (GGTL1 a and b), and GGTL3 (GGTL3 a and b) were identified for each species. Phylogenetic analysis suggests that GGTs are ancient proteins conserved across most metazoan phyla and those paralogous GGTs in teleosts likely arose from the serial 3R genome duplication events. A third GGTL1 gene (GGTL1c) was found in green spotted pufferfish; however, this gene is not present in medaka, stickleback, or fugu. Similarly, one or both paralogs of GGTL3 appear to have been lost in green spotted pufferfish, fugu, and zebrafish. Syntenic relationships were highly maintained between duplicated teleost chromosomes, among teleosts and across ray-finned (Actinopterygii) and lobe-finned (Sarcopterygii) species. To assess subfunction partitioning, six medaka GGT genes were cloned and assessed for developmental and tissue-specific expression. On the basis of these data, we propose a modification of the "duplication-degeneration-complementation" model of subfunction partitioning where quantitative differences rather than absolute differences in gene expression are observed between gene paralogs. Our results demonstrate that multiple GGT genes have been retained within teleost genomes. Questions remain, however, regarding the functional roles of multiple GGTs in these species. Copyright © 2011 Wiley Periodicals, Inc., A Wiley Company.

  3. Spatio-Temporal Detection of the Thiomonas Population and the Thiomonas Arsenite Oxidase Involved in Natural Arsenite Attenuation Processes in the Carnoulès Acid Mine Drainage

    PubMed Central

    Hovasse, Agnès; Bruneel, Odile; Casiot, Corinne; Desoeuvre, Angélique; Farasin, Julien; Hery, Marina; Van Dorsselaer, Alain; Carapito, Christine; Arsène-Ploetze, Florence

    2016-01-01

    The acid mine drainage (AMD) impacted creek of the Carnoulès mine (Southern France) is characterized by acid waters with a high heavy metal content. The microbial community inhabiting this AMD was extensively studied using isolation, metagenomic and metaproteomic methods, and the results showed that a natural arsenic (and iron) attenuation process involving the arsenite oxidase activity of several Thiomonas strains occurs at this site. A sensitive quantitative Selected Reaction Monitoring (SRM)-based proteomic approach was developed for detecting and quantifying the two subunits of the arsenite oxidase and RpoA of two different Thiomonas groups. Using this approach combined with FISH and pyrosequencing-based 16S rRNA gene sequence analysis, it was established here for the first time that these Thiomonas strains are ubiquitously present in minor proportions in this AMD and that they express the key enzymes involved in natural remediation processes at various locations and time points. In addition to these findings, this study also confirms that targeted proteomics applied at the community level can be used to detect weakly abundant proteins in situ. PMID:26870729

  4. Identification of Antimony- and Arsenic-Oxidizing Bacteria Associated with Antimony Mine Tailing

    PubMed Central

    Hamamura, Natsuko; Fukushima, Koh; Itai, Takaaki

    2013-01-01

    Antimony (Sb) is a naturally occurring toxic element commonly associated with arsenic (As) in the environment and both elements have similar chemistry and toxicity. Increasing numbers of studies have focused on microbial As transformations, while microbial Sb interactions are still not well understood. To gain insight into microbial roles in the geochemical cycling of Sb and As, soils from Sb mine tailing were examined for the presence of Sb- and As-oxidizing bacteria. After aerobic enrichment culturing with AsIII (10 mM) or SbIII (100 μM), pure cultures of Pseudomonas- and Stenotrophomonas-related isolates with SbIII oxidation activities and a Sinorhizobium-related isolate capable of AsIII oxidation were obtained. The AsIII-oxidizing Sinorhizobium isolate possessed the aerobic arsenite oxidase gene (aioA), the expression of which was induced in the presence of AsIII or SbIII. However, no SbIII oxidation activity was detected from the Sinorhizobium-related isolate, suggesting the involvement of different mechanisms for Sb and As oxidation. These results demonstrate that indigenous microorganisms associated with Sb mine soils are capable of Sb and As oxidation, and potentially contribute to the speciation and mobility of Sb and As in situ. PMID:23666539

  5. Transcriptional responses of zebrafish to complex metal mixtures in laboratory studies overestimates the responses observed with environmental water.

    PubMed

    Pradhan, Ajay; Ivarsson, Per; Ragnvaldsson, Daniel; Berg, Håkan; Jass, Jana; Olsson, Per-Erik

    2017-04-15

    Metals released into the environment continue to be of concern for human health. However, risk assessment of metal exposure is often based on total metal levels and usually does not take bioavailability data, metal speciation or matrix effects into consideration. The continued development of biological endpoint analyses are therefore of high importance for improved eco-toxicological risk analyses. While there is an on-going debate concerning synergistic or additive effects of low-level mixed exposures there is little environmental data confirming the observations obtained from laboratory experiments. In the present study we utilized qRT-PCR analysis to identify key metal response genes to develop a method for biomonitoring and risk-assessment of metal pollution. The gene expression patterns were determined for juvenile zebrafish exposed to waters from sites down-stream of a closed mining operation. Genes representing different physiological processes including stress response, inflammation, apoptosis, drug metabolism, ion channels and receptors, and genotoxicity were analyzed. The gene expression patterns of zebrafish exposed to laboratory prepared metal mixes were compared to the patterns obtained with fish exposed to the environmental samples with the same metal composition and concentrations. Exposure to environmental samples resulted in fewer alterations in gene expression compared to laboratory mixes. A biotic ligand model (BLM) was used to approximate the bioavailability of the metals in the environmental setting. However, the BLM results were not in agreement with the experimental data, suggesting that the BLM may be overestimating the risk in the environment. The present study therefore supports the inclusion of site-specific biological analyses to complement the present chemical based assays used for environmental risk-assessment. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Mining the LIPG Allelic Spectrum Reveals the Contribution of Rare and Common Regulatory Variants to HDL Cholesterol

    PubMed Central

    Raghavan, Avanthi; Neeli, Hemanth; Jin, Weijun; Badellino, Karen O.; Demissie, Serkalem; Manning, Alisa K.; DerOhannessian, Stephanie L.; Wolfe, Megan L.; Cupples, L. Adrienne; Li, Mingyao; Kathiresan, Sekar; Rader, Daniel J.

    2011-01-01

    Genome-wide association studies (GWAS) have successfully identified loci associated with quantitative traits, such as blood lipids. Deep resequencing studies are being utilized to catalogue the allelic spectrum at GWAS loci. The goal of these studies is to identify causative variants and missing heritability, including heritability due to low frequency and rare alleles with large phenotypic impact. Whereas rare variant efforts have primarily focused on nonsynonymous coding variants, we hypothesized that noncoding variants in these loci are also functionally important. Using the HDL-C gene LIPG as an example, we explored the effect of regulatory variants identified through resequencing of subjects at HDL-C extremes on gene expression, protein levels, and phenotype. Resequencing a portion of the LIPG promoter and 5′ UTR in human subjects with extreme HDL-C, we identified several rare variants in individuals from both extremes. Luciferase reporter assays were used to measure the effect of these rare variants on LIPG expression. Variants conferring opposing effects on gene expression were enriched in opposite extremes of the phenotypic distribution. Minor alleles of a common regulatory haplotype and noncoding GWAS SNPs were associated with reduced plasma levels of the LIPG gene product endothelial lipase (EL), consistent with its role in HDL-C catabolism. Additionally, we found that a common nonfunctional coding variant associated with HDL-C (rs2000813) is in linkage disequilibrium with a 5′ UTR variant (rs34474737) that decreases LIPG promoter activity. We attribute the gene regulatory role of rs34474737 to the observed association of the coding variant with plasma EL levels and HDL-C. Taken together, the findings show that both rare and common noncoding regulatory variants are important contributors to the allelic spectrum in complex trait loci. PMID:22174694

  7. Ontology-based meta-analysis of global collections of high-throughput public data.

    PubMed

    Kupershmidt, Ilya; Su, Qiaojuan Jane; Grewal, Anoop; Sundaresh, Suman; Halperin, Inbal; Flynn, James; Shekar, Mamatha; Wang, Helen; Park, Jenny; Cui, Wenwu; Wall, Gregory D; Wisotzkey, Robert; Alag, Satnam; Akhtari, Saeid; Ronaghi, Mostafa

    2010-09-29

    The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.

  8. Literature mining, gene-set enrichment and pathway analysis for target identification in Behçet's disease.

    PubMed

    Wilson, Paul; Larminie, Christopher; Smith, Rona

    2016-01-01

    To use literature mining to catalogue Behçet's associated genes, and advanced computational methods to improve the understanding of the pathways and signalling mechanisms that lead to the typical clinical characteristics of Behçet's patients. To extend this technique to identify potential treatment targets for further experimental validation. Text mining methods combined with gene enrichment tools, pathway analysis and causal analysis algorithms. This approach identified 247 human genes associated with Behçet's disease and the resulting disease map, comprising 644 nodes and 19220 edges, captured important details of the relationships between these genes and their associated pathways, as described in diverse data repositories. Pathway analysis has identified how Behçet's associated genes are likely to participate in innate and adaptive immune responses. Causal analysis algorithms have identified a number of potential therapeutic strategies for further investigation. Computational methods have captured pertinent features of the prominent disease characteristics presented in Behçet's disease and have highlighted NOD2, ICOS and IL18 signalling as potential therapeutic strategies.

  9. Microarray Meta-Analysis Focused on the Response of Genes Involved in Redox Homeostasis to Diverse Abiotic Stresses in Rice

    PubMed Central

    de Abreu Neto, Joao B.; Frei, Michael

    2016-01-01

    Plants are exposed to a wide range of abiotic stresses (AS), which often occur in combination. Because physiological investigations typically focus on one stress, our understanding of unspecific stress responses remains limited. The plant redox homeostasis, i.e., the production and removal of reactive oxygen species (ROS), may be involved in many environmental stress conditions. Therefore, this study intended to identify genes, which are activated in diverse AS, focusing on ROS-related pathways. We conducted a meta-analysis (MA) of microarray experiments, focusing on rice. Transcriptome data were mined from public databases and fellow researchers, which represented 36 different experiments and investigated diverse AS, including ozone stress, drought, heat, cold, salinity, and mineral deficiencies/toxicities. To overcome the inherent artifacts of different MA methods, data were processed using Fisher, rOP, REM, and product of rank (GeneSelector), and genes identified by most approaches were considered as shared differentially expressed genes (DEGs). Two MA strategies were adopted: first, datasets were separated into shoot, root, and seedling experiments, and these tissues were analyzed separately to identify shared DEGs. Second, shoot and seedling experiments were classed into oxidative stress (OS), i.e., ozone and hydrogen peroxide treatments directly producing ROS in plant tissue, and other AS, in which ROS production is indirect. In all tissues and stress conditions, genes a priori considered as ROS-related were overrepresented among the DEGs, as they represented 4% of all expressed genes but 7–10% of the DEGs. The combined MA approach was substantially more conservative than individual MA methods and identified 1001 shared DEGs in shoots, 837 shared DEGs in root, and 1172 shared DEGs in seedlings. Within the OS and AS groups, 990 and 1727 shared DEGs were identified, respectively. In total, 311 genes were shared between OS and AS, including many regulatory genes. Combined co-expression analysis identified among those a cluster of 42 genes, many involved in the photosynthetic apparatus and responsive to drought, iron deficiency, arsenic toxicity, and ozone. Our data demonstrate the importance of redox homeostasis in plant stress responses and the power of MA to identify candidate genes underlying unspecific signaling pathways. PMID:26793229

  10. OntoGene web services for biomedical text mining.

    PubMed

    Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul

    2014-01-01

    Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.

  11. OntoGene web services for biomedical text mining

    PubMed Central

    2014-01-01

    Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them. PMID:25472638

  12. MINE: Module Identification in Networks

    PubMed Central

    2011-01-01

    Background Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks. Results MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties. Conclusions MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans. PMID:21605434

  13. Novel strategies to mine alcoholism-related haplotypes and genes by combining existing knowledge framework.

    PubMed

    Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng

    2009-02-01

    High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.

  14. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    PubMed Central

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  15. 43 CFR 3420.1-4 - General requirements for land use planning.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... mining by other than underground mining techniques. (ii) For the purposes of this paragraph, any surface... techniques shall be deemed to have expressed a preference in favor of mining. Where a significant number of... underground mining techniques, that area shall be considered acceptable for further consideration only for...

  16. 43 CFR 3420.1-4 - General requirements for land use planning.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... mining by other than underground mining techniques. (ii) For the purposes of this paragraph, any surface... techniques shall be deemed to have expressed a preference in favor of mining. Where a significant number of... underground mining techniques, that area shall be considered acceptable for further consideration only for...

  17. 43 CFR 3420.1-4 - General requirements for land use planning.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... mining by other than underground mining techniques. (ii) For the purposes of this paragraph, any surface... techniques shall be deemed to have expressed a preference in favor of mining. Where a significant number of... underground mining techniques, that area shall be considered acceptable for further consideration only for...

  18. 43 CFR 3420.1-4 - General requirements for land use planning.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... mining by other than underground mining techniques. (ii) For the purposes of this paragraph, any surface... techniques shall be deemed to have expressed a preference in favor of mining. Where a significant number of... underground mining techniques, that area shall be considered acceptable for further consideration only for...

  19. Integrative genetic analysis of transcription modules: towards filling the gap between genetic lociand inherited traits

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Hongqiang; Chen, Hao; Bao, Lei

    2005-01-01

    Genetic loci that regulate inherited traits are routinely identified using quantitative trait locus (QTL) mapping methods. However, the genotype-phenotype associations do not provide information on the gene expression program through which the genetic loci regulate the traits. Transcription modules are 'selfconsistent regulatory units' and are closely related to the modular components of gene regulatory network [Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y. and Barkai, N. (2002) Revealing modular organization in the yeast transcriptional network. Nat. Genet., 31, 370-377; Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifyingmore » regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166-176]. We used genome-wide genotype and gene expression data of a genetic reference population that consists of mice of 32 recombinant inbred strains to identify the transcription modules and the genetic loci regulating them. Twenty-nine transcription modules defined by genetic variations were identified. Statistically significant associations between the transcription modules and 18 classical physiological and behavioral traits were found. Genome-wide interval mapping showed that major QTLs regulating the transcription modules are often co-localized with the QTLs regulating the associated classical traits. The association and the possible co-regulation of the classical trait and transcription module indicate that the transcription module may be involved in the gene pathways connecting the QTL and the classical trait. Our results show that a transcription module may associate with multiple seemingly unrelated classical traits and a classical trait may associate with different modules. Literature mining results provided strong independent evidences for the relations among genes of the transcription modules, genes in the regions of the QTLs regulating the transcription modules and the keywords representing the classical traits.« less

  20. RNA-seq analysis identifies an intricate regulatory network controlling cluster root development in white lupin

    PubMed Central

    2014-01-01

    Background Highly adapted plant species are able to alter their root architecture to improve nutrient uptake and thrive in environments with limited nutrient supply. Cluster roots (CRs) are specialised structures of dense lateral roots formed by several plant species for the effective mining of nutrient rich soil patches through a combination of increased surface area and exudation of carboxylates. White lupin is becoming a model-species allowing for the discovery of gene networks involved in CR development. A greater understanding of the underlying molecular mechanisms driving these developmental processes is important for the generation of smarter plants for a world with diminishing resources to improve food security. Results RNA-seq analyses for three developmental stages of the CR formed under phosphorus-limited conditions and two of non-cluster roots have been performed for white lupin. In total 133,045,174 high-quality paired-end reads were used for a de novo assembly of the root transcriptome and merged with LAGI01 (Lupinus albus gene index) to generate an improved LAGI02 with 65,097 functionally annotated contigs. This was followed by comparative gene expression analysis. We show marked differences in the transcriptional response across the various cluster root stages to adjust to phosphate limitation by increasing uptake capacity and adjusting metabolic pathways. Several transcription factors such as PLT, SCR, PHB, PHV or AUX/IAA with a known role in the control of meristem activity and developmental processes show an increased expression in the tip of the CR. Genes involved in hormonal responses (PIN, LAX, YUC) and cell cycle control (CYCA/B, CDK) are also differentially expressed. In addition, we identify primary transcripts of miRNAs with established function in the root meristem. Conclusions Our gene expression analysis shows an intricate network of transcription factors and plant hormones controlling CR initiation and formation. In addition, functional differences between the different CR developmental stages in the acclimation to phosphorus starvation have been identified. PMID:24666749

  1. Text Mining to Support Gene Ontology Curation and Vice Versa.

    PubMed

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  2. Transcription Factor Amr1 Induces Melanin Biosynthesis and Suppresses Virulence in Alternaria brassicicola

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cho, Yangrae; Srivastava, Akhil; Ohm, Robin A.

    2012-05-01

    Alternaria brassicicola is a successful saprophyte and necrotrophic plant pathogen. Several A. brassicicola genes have been characterized as affecting pathogenesis of Brassica species. To study regulatory mechanisms of pathogenesis, we mined 421 genes in silico encoding putative transcription factors in a machine-annotated, draft genome sequence of A. brassicicola. In this study, targeted gene disruption mutants for 117 of the transcription factor genes were produced and screened. Three of these genes were associated with pathogenesis. Disruption mutants of one gene (AbPacC) were nonpathogenic and another gene (AbVf8) caused lesions less than half the diameter of wild-type lesions. Unexpectedly, mutants of themore » third gene, Amr1, caused lesions with a two-fold larger diameter than the wild type and complementation mutants. Amr1 is a homolog of Cmr1, a transcription factor that regulates melanin biosynthesis in several fungi. We created gene deletion mutants of ?amr1 and characterized their phenotypes. The ?amr1 mutants used pectin as a carbon source more efficiently than the wild type, were melanin-deficient, and more sensitive to UV light and glucanase digestion. The AMR1 protein was localized in the nuclei of hyphae and in highly melanized conidia during the late stage of plant pathogenesis. RNA-seq analysis revealed that three genes in the melanin biosynthesis pathway, along with the deleted Amr1 gene, were expressed at low levels in the mutants. In contrast, many hydrolytic enzyme-coding genes were expressed at higher levels in the mutants than in the wild type during pathogenesis. The results of this study suggested that a gene important for survival in nature negatively affected virulence, probably by a less efficient use of plant cell-wall materials. We speculate that the functions of the Amr1 gene are important to the success of A. brassicicola as a competitive saprophyte and plant parasite.« less

  3. Orphan Crops Browser: a bridge between model and orphan crops.

    PubMed

    Kamei, Claire Lessa Alvim; Severing, Edouard I; Dechesne, Annemarie; Furrer, Heleen; Dolstra, Oene; Trindade, Luisa M

    2016-01-01

    Many important crops have received little attention by the scientific community, either because they are not considered economically important or due to their large and complex genomes. De novo transcriptome assembly, using next-generation sequencing data, is an attractive option for the study of these orphan crops. In spite of the large amount of sequencing data that can be generated, there is currently a lack of tools which can effectively help molecular breeders and biologists to mine this type of information. Our goal was to develop a tool that enables molecular breeders, without extensive bioinformatics knowledge, to efficiently study de novo transcriptome data from any orphan crop (http://www.bioinformatics.nl/denovobrowser/db/species/index). The Orphan Crops Browser has been designed to facilitate the following tasks (1) search and identification of candidate transcripts based on phylogenetic relationships between orthologous sequence data from a set of related species and (2) design specific and degenerate primers for expression studies in the orphan crop of interest. To demonstrate the usability and reliability of the browser, it was used to identify the putative orthologues of 17 known lignin biosynthetic genes from maize and sugarcane in the orphan crop Miscanthus sinensis . Expression studies in miscanthus stem internode tissue differing in maturation were subsequently carried out, to follow the expression of these genes during lignification. Our results showed a negative correlation between lignin content and gene expression. The present data are in agreement with recent findings in maize and other crops, and it is further discussed in this paper.

  4. Microarray analysis of the rat lacrimal gland following the loss of parasympathetic control of secretion

    PubMed Central

    Nguyen, Doan H.; Toshida, Hiroshi; Schurr, Jill; Beuerman, Roger W.

    2010-01-01

    Previous studies showed that loss of muscarinic parasympathetic input to the lacrimal gland (LG) leads to a dramatic reduction in tear secretion and profound changes to LG structure. In this study, we used DNA microarrays to examine the regulation of the gene expression of the genes for secretory function and organization of the LG. Long-Evans rats anesthetized with a mixture of ketamine/xylazine (80:10 mg/kg) underwent unilateral sectioning of the greater superficial petrosal nerve, the input to the pterygopalatine ganglion. After 7 days, tear secretion was measured, the animals were killed, and structural changes in the LG were examined by light microscopy. Total RNA from control and experimental LGs (n = 5) was used for DNA microarray analysis employing the U34A GeneChip. Three statistical algorithms (detection, change call, and signal log ratio) were used to determine differential gene expression using the Microarray Suite (5.0) and Data Mining Tools (3.0). Tear secretion was significantly reduced and corneal ulcers developed in all experimental eyes. Light microscopy showed breakdown of the acinar structure of the LG. DNA microarray analysis showed downregulation of genes associated with the endoplasmic reticulum and Golgi, including genes involved in protein folding and processing. Conversely, transcripts for cytoskeleton and extracellular matrix components, inflammation, and apoptosis were upregulated. The number of significantly upregulated genes (116) was substantially greater than the number of downregulated genes (49). Removal of the main secretory input to the rat LG resulted in clinical symptoms associated with severe dry eye. Components of the secretory pathway were negatively affected, and the increase in cell proliferation and inflammation may lead to loss of organization in the parasympathectomized lacrimal gland. PMID:15084711

  5. Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review.

    PubMed

    Raddatz, Barbara B; Spitzbarth, Ingo; Matheis, Katja A; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang; Ulrich, Reiner

    2017-09-01

    High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.

  6. Characterization of the intronic portion of cadherin superfamily members, common cancer orchestrators

    PubMed Central

    Oliveira, Patrícia; Sanges, Remo; Huntsman, David; Stupka, Elia; Oliveira, Carla

    2012-01-01

    Cadherins are cell–cell adhesion proteins essential for the maintenance of tissue architecture and integrity, and their impairment is often associated with human cancer. Knowledge regarding regulatory mechanisms associated with cadherin misexpression in cancer is scarce. Specific features of the intronic-structure and intronic-based regulatory mechanisms in the cadherin superfamily are unidentified. This study aims at systematically characterizing the intronic portion of cadherin superfamily members and the identification of intronic regions constituting putative targets/triggers of regulation, using a bioinformatic approach and biological data mining. Our study demonstrates that the cadherin superfamily genes harbour specific characteristics in comparison to all non-cadherin genes, both from the genomic and transcriptional standpoints. Cadherin superfamily genes display higher average total intron number and significantly longer introns than other genes and across the entire vertebrate lineage. Moreover, in the human genome, we observed an uncommon high frequency of MIR (mammalian-wide interspersed repeats) and MaLR (mammalian-wide interspersed repeats, a subtype of LTR) regulatory-associated repetitive elements at 5′-located introns, concomitantly with increased de novo intronic transcription. Using this approach, we identified cadherin intronic-specific sites that may constitute novel targets/triggers of cadherin superfamily expression regulation. These findings pinpoint the need to identify mechanisms affecting particularly MIR and MaLR elements located in introns 2 and 3 of human cadherin genes, possibly important in the expression modulation of this superfamily in homeostasis and cancer. PMID:22317972

  7. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  8. A novel regio‑specific cyclosporin hydroxylase gene revealed through the genome mining of Pseudonocardia autotrophica.

    PubMed

    Ban, Jun-Gyu; Woo, Min-Woo; Lee, Bo-Ram; Lee, Mi-Jin; Choi, Si-Sun; Kim, Eung-Soo

    2014-05-01

    The regio-specific hydroxylation at the 4th N-methyl leucine of the immunosuppressive agent cyclosporin A (CsA) was previously proposed to be mediated by a unique cytochrome P450 hydroxylase (CYP), CYP-sb21 from the rare actinomycetes Sebekia benihana. Interestingly, a different rare actinomycetes species, Pseudonocardia autotrophica, was found to possess a different regio-selectivity, the preferential hydroxylation at the 9th N-methyl leucine of CsA. Through an in silico analysis of the whole genome of P. autotrophica, we describe here the classification of 31 total CYPs in P. autotrophica. Three putative CsA CYP genes, showing the highest sequence homologies with CYPsb21, were successfully inactivated using PCR-targeted gene disruption. Only one knock-out mutant, ΔCYP-pa1, failed to convert CsA to its hydroxylated forms. The hydroxylation activity of CsA by CYP-pa1 was confirmed by CYP-pa1 gene complementation as well as heterologous expression in the CsA non-hydroxylating Streptomyces coelicolor. Moreover, the cyclosporine regio-selectivity of CYP-pa1 expressed in the ΔCYP-sb21 S. benihana mutant strain was also confirmed unchanged through cross complementation. These results show that preferential regio-specific hydroxylation at the 9th N-methyl leucine of CsA is carried out by a specific P450 hydroxylase gene in P. autotrophica, CYP-pa1, setting the stage for the biotechnological application of CsA regioselective hydroxylation.

  9. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  10. Relationship between genetic polymorphisms in the DRD5 gene and paranoid schizophrenia in northern Han Chinese.

    PubMed

    Zhao, Y; Ding, M; Pang, H; Xu, X M; Wang, B J

    2014-03-12

    Dopamine (DA) has been implicated in the pathophysiol-ogy of several psychiatric disorders, including schizophrenia. Thus, genes related to the dopaminergic (DAergic) system are good candidate genes for schizophrenia. One of receptors of the DA receptor system is dopa-mine receptor 5 (DRD5). Single nucleotide polymorphisms (SNPs) in the regulatory regions of DRD5 gene may affect gene expression, influence biosynthesis of DA and underlie various neuropsychiatric disorders re-lated to DA dysfunction. The present study explored the association of SNPs within the DRD5 gene with paranoid schizophrenia in Han Chinese. A total of 176 patients with schizophrenia and 206 healthy controls were genotyped for four DRD5 SNPs (rs77434921, rs2076907, rs6283, and rs1800762). Significant group differences were observed in the allele and genotype frequencies of rs77434921 and rs1800762 and in the frequen-cies of GC haplotypes corresponding to rs77434921-rs1800762. Our find-ings suggest that common genetic variations of DRD5 are likely to con-tribute to genetic susceptibility to paranoid schizophrenia in Han Chinese. Further studies in larger samples are needed to replicate this association.

  11. An Outbreak of Lymphocutaneous Sporotrichosis among Mine-Workers in South Africa.

    PubMed

    Govender, Nelesh P; Maphanga, Tsidiso G; Zulu, Thokozile G; Patel, Jaymati; Walaza, Sibongile; Jacobs, Charlene; Ebonwu, Joy I; Ntuli, Sindile; Naicker, Serisha D; Thomas, Juno

    2015-09-01

    The largest outbreak of sporotrichosis occurred between 1938 and 1947 in the gold mines of Witwatersrand in South Africa. Here, we describe an outbreak of lymphocutaneous sporotrichosis that was investigated in a South African gold mine in 2011. Employees working at a reopened section of the mine were recruited for a descriptive cross-sectional study. Informed consent was sought for interview, clinical examination and medical record review. Specimens were collected from participants with active or partially-healed lymphocutaneous lesions. Environmental samples were collected from underground mine levels. Sporothrix isolates were identified by sequencing of the internal transcribed spacer region of the ribosomal gene and the nuclear calmodulin gene. Of 87 male miners, 81 (93%) were interviewed and examined, of whom 29 (36%) had skin lesions; specimens were collected from 17 (59%). Sporotrichosis was laboratory-confirmed among 10 patients and seven had clinically-compatible lesions. Of 42 miners with known HIV status, 11 (26%) were HIV-infected. No cases of disseminated disease were detected. Participants with ≤ 3 years' mining experience had a four times greater odds of developing sporotrichosis than those who had been employed for >3 years (adjusted OR 4.0, 95% CI 1.2-13.1). Isolates from 8 patients were identified as Sporothrix schenckii sensu stricto by calmodulin gene sequencing while environmental isolates were identified as Sporothrix mexicana. S. schenckii sensu stricto was identified as the causative pathogen. Although genetically distinct species were isolated from clinical and environmental sources, it is likely that the source was contaminated soil and untreated wood underground. No cases occurred following recommendations to close sections of the mine, treat timber and encourage consistent use of personal protective equipment. Sporotrichosis is a potentially re-emerging disease where traditional, rather than heavily mechanised, mining techniques are used. Surveillance should be instituted at sentinel locations.

  12. Mouse Tumor Biology (MTB): a database of mouse models for human cancer.

    PubMed

    Bult, Carol J; Krupke, Debra M; Begley, Dale A; Richardson, Joel E; Neuhauser, Steven B; Sundberg, John P; Eppig, Janan T

    2015-01-01

    The Mouse Tumor Biology (MTB; http://tumor.informatics.jax.org) database is a unique online compendium of mouse models for human cancer. MTB provides online access to expertly curated information on diverse mouse models for human cancer and interfaces for searching and visualizing data associated with these models. The information in MTB is designed to facilitate the selection of strains for cancer research and is a platform for mining data on tumor development and patterns of metastases. MTB curators acquire data through manual curation of peer-reviewed scientific literature and from direct submissions by researchers. Data in MTB are also obtained from other bioinformatics resources including PathBase, the Gene Expression Omnibus and ArrayExpress. Recent enhancements to MTB improve the association between mouse models and human genes commonly mutated in a variety of cancers as identified in large-scale cancer genomics studies, provide new interfaces for exploring regions of the mouse genome associated with cancer phenotypes and incorporate data and information related to Patient-Derived Xenograft models of human cancers. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Cloning and expression of gamma carbonic anhydrase from Serratia sp. ISTD04 for sequestration of carbon dioxide and formation of calcite.

    PubMed

    Srivastava, Shaili; Bharti, Randhir Kumar; Verma, Praveen Kumar; Thakur, Indu Shekhar

    2015-01-01

    Bacterial strains isolated from marble mines rock and enriched in the chemostat culture with different concentrations of sodium bicarbonate. The enriched consortium had six bacterial isolates. One of bacterium isolate showed carbonic anhydrase (CA) activity by catalyzing the reversible hydration reaction of carbon dioxide to bicarbonate. The bacterium was identified as Serratia sp. by 16S rRNA sequence analysis. The carbonic anhydrase gene from Serratia sp. was found to be homologous with gamma carbonic anhydrase. The carbonic anhydrase gene was cloned in PET21b(+) and expressed it in recombinant Escherichia coli BL21 (DE3) with His-tag at the C-terminus. The recombinant protein was purified efficiently by using one-step nickel affinity chromatography. Expected size of carbonic anhydrase was approximately 29 kDa in SDS-PAGE gel. Recombinant carbonic anhydrase enzyme was used for biomineralization-based conversion of atmospheric CO2 into valuable calcite minerals. The calcification was confirmed by using XRD, FTIR, EDX and SEM analysis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Bacteria and Genes Involved in Arsenic Speciation in Sediment Impacted by Long-Term Gold Mining

    PubMed Central

    Costa, Patrícia S.; Scholte, Larissa L. S.; Reis, Mariana P.; Chaves, Anderson V.; Oliveira, Pollyanna L.; Itabayana, Luiza B.; Suhadolnik, Maria Luiza S.; Barbosa, Francisco A. R.; Chartone-Souza, Edmar; Nascimento, Andréa M. A.

    2014-01-01

    The bacterial community and genes involved in geobiocycling of arsenic (As) from sediment impacted by long-term gold mining were characterized through culture-based analysis of As-transforming bacteria and metagenomic studies of the arsC, arrA, and aioA genes. Sediment was collected from the historically gold mining impacted Mina stream, located in one of the world’s largest mining regions known as the “Iron Quadrangle”. A total of 123 As-resistant bacteria were recovered from the enrichment cultures, which were phenotypically and genotypically characterized for As-transformation. A diverse As-resistant bacteria community was found through phylogenetic analyses of the 16S rRNA gene. Bacterial isolates were affiliated with Proteobacteria, Firmicutes, and Actinobacteria and were represented by 20 genera. Most were AsV-reducing (72%), whereas AsIII-oxidizing accounted for 20%. Bacteria harboring the arsC gene predominated (85%), followed by aioA (20%) and arrA (7%). Additionally, we identified two novel As-transforming genera, Thermomonas and Pannonibacter. Metagenomic analysis of arsC, aioA, and arrA sequences confirmed the presence of these genes, with arrA sequences being more closely related to uncultured organisms. Evolutionary analyses revealed high genetic similarity between some arsC and aioA sequences obtained from isolates and clone libraries, suggesting that those isolates may represent environmentally important bacteria acting in As speciation. In addition, our findings show that the diversity of arrA genes is wider than earlier described, once none arrA-OTUs were affiliated with known reference strains. Therefore, the molecular diversity of arrA genes is far from being fully explored deserving further attention. PMID:24755825

  15. Generation, Annotation, and Analysis of a Large-Scale Expressed Sequence Tag Library from Arabidopsis pumila to Explore Salt-Responsive Genes.

    PubMed

    Huang, Xianzhong; Yang, Lifei; Jin, Yuhuan; Lin, Jun; Liu, Fang

    2017-01-01

    Arabidopsis pumila is an ephemeral plant, and a close relative of the model plant Arabidopsis thaliana , but it possesses higher photosynthetic efficiency, higher propagation rate, and higher salinity tolerance compared to those A. thaliana , thus providing a candidate plant system for gene mining for environmental adaption and salt tolerance. However, A. pumila is an under-explored resource for understanding the genetic mechanisms underlying abiotic stress adaptation. To improve our understanding of the molecular and genetic mechanisms of salt stress adaptation, more than 19,900 clones randomly selected from a cDNA library constructed previously from leaf tissue exposed to high-salinity shock were sequenced. A total of 16,014 high-quality expressed sequence tags (ESTs) were generated, which have been deposited in the dbEST GenBank under accession numbers JZ932319 to JZ948332. Clustering and assembly of these ESTs resulted in the identification of 8,835 unique sequences, consisting of 2,469 contigs and 6,366 singletons. The blastx results revealed 8,011 unigenes with significant similarity to known genes, while only 425 unigenes remained uncharacterized. Functional classification demonstrated an abundance of unigenes involved in binding, catalytic, structural or transporter activities, and in pathways of energy, carbohydrate, amino acid, or lipid metabolism. At least seven main classes of genes were related to salt-tolerance among the 8,835 unigenes. Many previously reported salt tolerance genes were also manifested in this library, for example VP1, H + -ATPase, NHX1, SOS2, SOS3, NAC, MYB, ERF, LEA, P5CS1 . In addition, 251 transcription factors were identified from the library, classified into 42 families. Lastly, changes in expression of the 12 most abundant unigenes, 12 transcription factor genes, and 19 stress-related genes in the first 24 h of exposure to high-salinity stress conditions were monitored by qRT-PCR. The large-scale EST library obtained in this study provides first-hand information on gene sequences expressed in young leaves of A. pumila exposed to salt shock. The rapid discovery of known or unknown genes related to salinity stress response in A. pumila will facilitate the understanding of complex adaptive mechanisms for ephemerals.

  16. A Rb1 promoter variant with reduced activity contributes to osteosarcoma susceptibility in irradiated mice

    PubMed Central

    2014-01-01

    Background Syndromic forms of osteosarcoma (OS) account for less than 10% of all recorded cases of this malignancy. An individual OS predisposition is also possible by the inheritance of low penetrance alleles of tumor susceptibility genes, usually without evidence of a syndromic condition. Genetic variants involved in such a non-syndromic form of tumor predisposition are difficult to identify, given the low incidence of osteosarcoma cases and the genetic heterogeneity of patients. We recently mapped a major OS susceptibility QTL to mouse chromosome 14 by comparing alpha-radiation induced osteosarcoma in mouse strains which differ in their tumor susceptibility. Methods Tumor-specific allelic losses in murine osteosacoma were mapped along chromosome 14 using microsatellite markers and SNP allelotyping. Candidate gene search in the mapped interval was refined using PosMed data mining and mRNA expression analysis in normal osteoblasts. A strain-specific promoter variant in Rb1 was tested for its influence on mRNA expression using reporter assay. Results A common Rb1 allele derived from the BALB/cHeNhg strain was identified as the major determinant of radiation-induced OS risk at this locus. Increased OS-risk is linked with a hexanucleotide deletion in the promoter region which is predicted to change WT1 and SP1 transcription factor-binding sites. Both in-vitro reporter and in-vivo expression assays confirmed an approx. 1.5 fold reduced gene expression by this promoter variant. Concordantly, the 50% reduction in Rb1 expression in mice bearing a conditional hemizygous Rb1 deletion causes a significant rise of OS incidence following alpha-irradiation. Conclusion This is the first experimental demonstration of a functional and genetic link between reduced Rb1 expression from a common promoter variant and increased tumor risk after radiation exposure. We propose that a reduced Rb1 expression by common variants in regulatory regions can modify the risk for a malignant transformation of bone cells after radiation exposure. PMID:25092376

  17. Insight into Genotype-Phenotype Associations through eQTL Mapping in Multiple Cell Types in Health and Immune-Mediated Disease

    PubMed Central

    Peters, James E.; Lyons, Paul A.; Lee, James C.; Richard, Arianne C.; Fortune, Mary D.; Newcombe, Paul J.; Richardson, Sylvia; Smith, Kenneth G. C.

    2016-01-01

    Genome-wide association studies (GWAS) have transformed our understanding of the genetics of complex traits such as autoimmune diseases, but how risk variants contribute to pathogenesis remains largely unknown. Identifying genetic variants that affect gene expression (expression quantitative trait loci, or eQTLs) is crucial to addressing this. eQTLs vary between tissues and following in vitro cellular activation, but have not been examined in the context of human inflammatory diseases. We performed eQTL mapping in five primary immune cell types from patients with active inflammatory bowel disease (n = 91), anti-neutrophil cytoplasmic antibody-associated vasculitis (n = 46) and healthy controls (n = 43), revealing eQTLs present only in the context of active inflammatory disease. Moreover, we show that following treatment a proportion of these eQTLs disappear. Through joint analysis of expression data from multiple cell types, we reveal that previous estimates of eQTL immune cell-type specificity are likely to have been exaggerated. Finally, by analysing gene expression data from multiple cell types, we find eQTLs not previously identified by database mining at 34 inflammatory bowel disease-associated loci. In summary, this parallel eQTL analysis in multiple leucocyte subsets from patients with active disease provides new insights into the genetic basis of immune-mediated diseases. PMID:27015630

  18. Next-generation transcriptome sequencing of the premenopausal breast epithelium using specimens from a normal human breast tissue bank.

    PubMed

    Pardo, Ivanesa; Lillemoe, Heather A; Blosser, Rachel J; Choi, MiRan; Sauder, Candice A M; Doxey, Diane K; Mathieson, Theresa; Hancock, Bradley A; Baptiste, Dadrie; Atale, Rutuja; Hickenbotham, Matthew; Zhu, Jin; Glasscock, Jarret; Storniolo, Anna Maria V; Zheng, Faye; Doerge, R W; Liu, Yunlong; Badve, Sunil; Radovich, Milan; Clare, Susan E

    2014-03-17

    Our efforts to prevent and treat breast cancer are significantly impeded by a lack of knowledge of the biology and developmental genetics of the normal mammary gland. In order to provide the specimens that will facilitate such an understanding, The Susan G. Komen for the Cure Tissue Bank at the IU Simon Cancer Center (KTB) was established. The KTB is, to our knowledge, the only biorepository in the world prospectively established to collect normal, healthy breast tissue from volunteer donors. As a first initiative toward a molecular understanding of the biology and developmental genetics of the normal mammary gland, the effect of the menstrual cycle and hormonal contraceptives on DNA expression in the normal breast epithelium was examined. Using normal breast tissue from 20 premenopausal donors to KTB, the changes in the mRNA of the normal breast epithelium as a function of phase of the menstrual cycle and hormonal contraception were assayed using next-generation whole transcriptome sequencing (RNA-Seq). In total, 255 genes representing 1.4% of all genes were deemed to have statistically significant differential expression between the two phases of the menstrual cycle. The overwhelming majority (221; 87%) of the genes have higher expression during the luteal phase. These data provide important insights into the processes occurring during each phase of the menstrual cycle. There was only a single gene significantly differentially expressed when comparing the epithelium of women using hormonal contraception to those in the luteal phase. We have taken advantage of a unique research resource, the KTB, to complete the first-ever next-generation transcriptome sequencing of the epithelial compartment of 20 normal human breast specimens. This work has produced a comprehensive catalog of the differences in the expression of protein-coding genes as a function of the phase of the menstrual cycle. These data constitute the beginning of a reference data set of the normal mammary gland, which can be consulted for comparison with data developed from malignant specimens, or to mine the effects of the hormonal flux that occurs during the menstrual cycle.

  19. Next-generation transcriptome sequencing of the premenopausal breast epithelium using specimens from a normal human breast tissue bank

    PubMed Central

    2014-01-01

    Introduction Our efforts to prevent and treat breast cancer are significantly impeded by a lack of knowledge of the biology and developmental genetics of the normal mammary gland. In order to provide the specimens that will facilitate such an understanding, The Susan G. Komen for the Cure Tissue Bank at the IU Simon Cancer Center (KTB) was established. The KTB is, to our knowledge, the only biorepository in the world prospectively established to collect normal, healthy breast tissue from volunteer donors. As a first initiative toward a molecular understanding of the biology and developmental genetics of the normal mammary gland, the effect of the menstrual cycle and hormonal contraceptives on DNA expression in the normal breast epithelium was examined. Methods Using normal breast tissue from 20 premenopausal donors to KTB, the changes in the mRNA of the normal breast epithelium as a function of phase of the menstrual cycle and hormonal contraception were assayed using next-generation whole transcriptome sequencing (RNA-Seq). Results In total, 255 genes representing 1.4% of all genes were deemed to have statistically significant differential expression between the two phases of the menstrual cycle. The overwhelming majority (221; 87%) of the genes have higher expression during the luteal phase. These data provide important insights into the processes occurring during each phase of the menstrual cycle. There was only a single gene significantly differentially expressed when comparing the epithelium of women using hormonal contraception to those in the luteal phase. Conclusions We have taken advantage of a unique research resource, the KTB, to complete the first-ever next-generation transcriptome sequencing of the epithelial compartment of 20 normal human breast specimens. This work has produced a comprehensive catalog of the differences in the expression of protein-coding genes as a function of the phase of the menstrual cycle. These data constitute the beginning of a reference data set of the normal mammary gland, which can be consulted for comparison with data developed from malignant specimens, or to mine the effects of the hormonal flux that occurs during the menstrual cycle. PMID:24636070

  20. Phenome-genome association studies of pancreatic cancer: new targets for therapy and diagnosis.

    PubMed

    Narayanan, Ramaswamy

    2015-01-01

    Pancreatic cancer, has a very high mortality rate and requires novel molecular targets for diagnosis and therapy. Genetic association studies over databases offer an attractive starting point for gene discovery. The National Center for Biotechnology Information (NCBI) Phenome Genome Integrator (PheGenI) tool was enriched for pancreatic cancer-associated traits. The genes associated with the trait were characterized using diverse bioinformatics tools for Genome-Wide Association (GWA), transcriptome and proteome profile and protein classes for motif and domain. Two hundred twenty-six genes were identified that had a genetic association with pancreatic cancer in the human genome. This included 25 uncharacterized open reading frames (ORFs). Bioinformatics analysis of these ORFs identified putative druggable proteins and biomarkers including enzymes, transporters and G-protein-coupled receptor signaling proteins. Secreted proteins including a neuroendocrine factor and a chemokine were identified. Five out of these ORFs encompassed non coding RNAs. The ORF protein expression was detected in numerous body fluids, such as ascites, bile, pancreatic juice, milk, plasma, serum and saliva. Transcriptome and proteome analyses showed a correlation of mRNA and protein expression for nine ORFs. Analysis of the Catalogue of Somatic Mutations in Cancer (COSMIC) database revealed a strong correlation across copy number variations and mRNA over-expression for four ORFs. Mining of the International Cancer Gene Consortium (ICGC) database identified somatic mutations in a significant number of pancreatic patients' tumors for most of these ORFs. The pancreatic cancer-associated ORFs were also found to be genetically associated with other neoplasms, including leukemia, malignant melanoma, neuroblastoma and prostate carcinomas, as well as other unrelated diseases and disorders, such as Alzheimer's disease, Crohn's disease, coronary diseases, attention deficit disorder and addiction. Based on Genome-Wide Association Studies (GWAS), copy number variations, somatic mutational status and correlation of gene expression in pancreatic tumors at the mRNA and protein level, expression specificity in normal tissues and detection in body fluids, six ORFs emerged as putative leads for pancreatic cancer. These six targets provide a basis for accelerated drug discovery and diagnostic marker development for pancreatic cancer. Copyright© 2015, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  1. Microbial diversity at the moderate acidic stage in three different sulfidic mine tailings dumps generating acid mine drainage.

    PubMed

    Korehi, Hananeh; Blöthe, Marco; Schippers, Axel

    2014-11-01

    In freshly deposited sulfidic mine tailings the pH is alkaline or circumneutral. Due to pyrite or pyrrhotite oxidation the pH is dropping over time to pH values <3 at which acidophilic iron- and sulfur-oxidizing prokaryotes prevail and accelerate the oxidation processes, well described for several mine waste sites. The microbial communities at the moderate acidic stage in mine tailings are only scarcely studied. Here we investigated the microbial diversity via 16S rRNA gene sequence analysis in eight samples (pH range 3.2-6.5) from three different sulfidic mine tailings dumps in Botswana, Germany and Sweden. In total 701 partial 16S rRNA gene sequences revealed a divergent microbial community between the three sites and at different tailings depths. Proteobacteria and Firmicutes were overall the most abundant phyla in the clone libraries. Acidobacteria, Actinobacteria, Bacteroidetes, and Nitrospira occurred less frequently. The found microbial communities were completely different to microbial communities in tailings at

  2. The regulatory network analysis of long noncoding RNAs in human colorectal cancer.

    PubMed

    Zhang, Yuwei; Tao, Yang; Li, Yang; Zhao, Jinshun; Zhang, Lina; Zhang, Xiaohong; Dong, Changzheng; Xie, Yangyang; Dai, Xiaoyu; Zhang, Xinjun; Liao, Qi

    2018-05-01

    Colorectal cancer (CRC) is among one of the most prevalent and lethiferous diseases worldwide. Long noncoding RNAs (lncRNAs) are commonly accepted to function as a key regulatory factor in human cancer, but the potential regulatory mechanisms of CRC-associated lncRNA are largely obscure. Here, we integrated several expression profiles to obtain 55 differentially expressed (DE) lncRNAs. We first detected lncRNA interactions with transcription factors, microRNAs, mRNAs, and RNA-binding proteins to construct a regulatory network and then create functional enrichment analyses for them using bioinformatics approaches. We found the upregulated genes in the regulatory network are enriched in cell cycle and DNA damage response, while the downregulated genes are enriched in cell differentiation, cellular response, and cell signaling. We then employed module-based methods to mine several intriguing modules from the overall network, which helps to classify the functions of genes more specifically. Next, we confirmed the validity of our network by comparisons with a randomized network using computational method. Finally, we attempted to annotate lncRNA functions based on the regulatory network, which indicated its potential application. Our study of the lncRNA regulatory network provided significant clues to unveil lncRNAs potential regulatory mechanisms in CRC and laid a foundation for further experimental investigation.

  3. Proteolysin, a Novel Highly Thermostable and Cosolvent-Compatible Protease from the Thermophilic Bacterium Coprothermobacter proteolyticus

    PubMed Central

    Toplak, Ana; Wu, Bian; Fusetti, Fabrizia; Quaedflieg, Peter J. L. M.

    2013-01-01

    Through genome mining, we identified a gene encoding a putative serine protease of the thermitase subgroup of subtilases (EC 3.4.21.66) in the thermophilic bacterium Coprothermobacter proteolyticus. The gene was functionally expressed in Escherichia coli, and the enzyme, which we called proteolysin, was purified to near homogeneity from crude cell lysate by a single heat treatment step. Proteolysin has a broad pH tolerance and is active at temperatures of up to 80°C. In addition, the enzyme shows good activity and stability in the presence of organic solvents, detergents, and dithiothreitol, and it remains active in 6 M guanidinium hydrochloride. Based on its stability and activity profile, proteolysin can be an excellent candidate for applications where resistance to harsh process conditions is required. PMID:23851086

  4. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

    PubMed

    Özgür, Arzucan; Hur, Junguk; He, Yongqun

    2016-01-01

    The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset. By modeling and representing multiple textual keywords for interaction types, the extended INO enabled the identification of complex biological gene-gene interactions represented with multiple keywords.

  5. Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

    PubMed Central

    Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709

  6. Metabolic pathways recruited in the production of a recombinant enveloped virus: mining targets for process and cell engineering.

    PubMed

    Rodrigues, A F; Formas-Oliveira, A S; Bandeira, V S; Alves, P M; Hu, W S; Coroadinha, A S

    2013-11-01

    Biopharmaceuticals derived from enveloped virus comprise an expanding market of vaccines, oncolytic vectors and gene therapy products. Thus, increased attention is given to the development of robust high-titer cell hosts for their manufacture. However, the knowledge on the physiological constraints modulating virus production is still scarce and the use of integrated strategies to improve hosts productivity and upstream bioprocess an under-explored territory. In this work, we conducted a functional genomics study, including the transcriptional profiling and central carbon metabolism analysis, following the metabolic changes in the transition 'parental-to-producer' of two human cell lines producing recombinant retrovirus. Results were gathered into three comprehensive metabolic maps, providing a broad and integrated overview of gene expression changes for both cell lines. Eight pathways were identified to be recruited in the virus production state: amino acid catabolism, carbohydrate catabolism and integration of the energy metabolism, nucleotide metabolism, glutathione metabolism, pentose phosphate pathway, polyamines biosynthesis and lipid metabolism. Their ability to modulate viral titers was experimentally challenged, leading to improved specific productivities of recombinant retrovirus up to 6-fold. Within recruited pathways in the virus production state, we sought for metabolic engineering gene targets in the low producing phenotypes. A mining strategy was used alternative to the traditional approach 'high vs. low producer' clonal comparison. Instead, 'high vs. low producer' from different genetic backgrounds (i.e. cell origins) were compared. Several genes were identified as limiting in the low-production phenotype, including two enzymes from cholesterol biosynthesis, two enzymes from glutathione biosynthesis and the regulatory machinery of polyamines biosynthesis. This is thus a frontier work, bridging fundamentals to technological research and contributing to enlarge our understanding of enveloped virus production dynamics in mammalian cell hosts. © 2013 Published by Elsevier Inc.

  7. Systematic Association of Genes to Phenotypes by Genome and Literature Mining

    PubMed Central

    Jensen, Lars J; Perez-Iratxeta, Carolina; Kaczanowski, Szymon; Hooper, Sean D; Andrade, Miguel A

    2005-01-01

    One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases. PMID:15799710

  8. Heterologous expression of cytotoxic sesquiterpenoids from the medicinal mushroom Lignosus rhinocerotis in yeast.

    PubMed

    Yap, Hui-Yeng Yeannie; Muria-Gonzalez, Mariano Jordi; Kong, Boon-Hong; Stubbs, Keith A; Tan, Chon-Seng; Ng, Szu-Ting; Tan, Nget-Hong; Solomon, Peter S; Fung, Shin-Yee; Chooi, Yit-Heng

    2017-06-12

    Genome mining facilitated by heterologous systems is an emerging approach to access the chemical diversity encoded in basidiomycete genomes. In this study, three sesquiterpene synthase genes, GME3634, GME3638, and GME9210, which were highly expressed in the sclerotium of the medicinal mushroom Lignosus rhinocerotis, were cloned and heterologously expressed in a yeast system. Metabolite profile analysis of the yeast culture extracts by GC-MS showed the production of several sesquiterpene alcohols (C 15 H 26 O), including cadinols and germacrene D-4-ol as major products. Other detected sesquiterpenes include selina-6-en-4-ol, β-elemene, β-cubebene, and cedrene. Two purified major compounds namely (+)-torreyol and α-cadinol synthesised by GME3638 and GME3634 respectively, are stereoisomers and their chemical structures were confirmed by 1 H and 13 C NMR. Phylogenetic analysis revealed that GME3638 and GME3634 are a pair of orthologues, and are grouped together with terpene synthases that synthesise cadinenes and related sesquiterpenes. (+)-Torreyol and α-cadinol were tested against a panel of human cancer cell lines and the latter was found to exhibit selective potent cytotoxicity in breast adenocarcinoma cells (MCF7) with IC 50 value of 3.5 ± 0.58 μg/ml while α-cadinol is less active (IC 50  = 18.0 ± 3.27 μg/ml). This demonstrates that yeast-based genome mining, guided by transcriptomics, is a promising approach for uncovering bioactive compounds from medicinal mushrooms.

  9. ClusterMine360: a database of microbial PKS/NRPS biosynthesis

    PubMed Central

    Conway, Kyle R.; Boddy, Christopher N.

    2013-01-01

    ClusterMine360 (http://www.clustermine360.ca/) is a database of microbial polyketide and non-ribosomal peptide gene clusters. It takes advantage of crowd-sourcing by allowing members of the community to make contributions while automation is used to help achieve high data consistency and quality. The database currently has >200 gene clusters from >185 compound families. It also features a unique sequence repository containing >10 000 polyketide synthase/non-ribosomal peptide synthetase domains. The sequences are filterable and downloadable as individual or multiple sequence FASTA files. We are confident that this database will be a useful resource for members of the polyketide synthases/non-ribosomal peptide synthetases research community, enabling them to keep up with the growing number of sequenced gene clusters and rapidly mine these clusters for functional information. PMID:23104377

  10. TCGA4U: A Web-Based Genomic Analysis Platform To Explore And Mine TCGA Genomic Data For Translational Research.

    PubMed

    Huang, Zhenzhen; Duan, Huilong; Li, Haomin

    2015-01-01

    Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.

  11. Genome-wide transcriptomic analysis of response to low temperature reveals candidate genes determining divergent cold-sensitivity of maize inbred lines.

    PubMed

    Sobkowiak, Alicja; Jończyk, Maciej; Jarochowska, Emilia; Biecek, Przemysław; Trzcinska-Danielewicz, Joanna; Leipner, Jörg; Fronk, Jan; Sowiński, Paweł

    2014-06-01

    Maize, despite being thermophyllic due to its tropical origin, demonstrates high intraspecific diversity in cold-tolerance. To search for molecular mechanisms of this diversity, transcriptomic response to cold was studied in two inbred lines of contrasting cold-tolerance. Microarray analysis was followed by extensive statistical elaboration of data, literature data mining, and gene ontology-based classification. The lines used had been bred earlier specifically for determination of QTLs for cold-performance of photosynthesis. This allowed direct comparison of present transcriptomic data with the earlier QTL mapping results. Cold-treated (14 h at 8/6 °C) maize seedlings of cold-tolerant ETH-DH7 and cold-sensitive ETH-DL3 lines at V3 stage showed strong, consistent response of the third leaf transcriptome: several thousand probes showed similar, statistically significant change in both lines, while only tens responded differently in the two lines. The most striking difference between the responses of the two lines to cold was the induction of expression of ca. twenty genes encoding membrane/cell wall proteins exclusively in the cold-tolerant ETH-DH7 line. The common response comprised mainly repression of numerous genes related to photosynthesis and induction of genes related to basic biological activity: transcription, regulation of gene expression, protein phosphorylation, cell wall organization. Among the genes showing differential response, several were close to the QTL regions identified in earlier studies with the same inbred lines and associated with biometrical, physiological or biochemical parameters. These transcripts, including two apparently non-protein-coding ones, are particularly attractive candidates for future studies on mechanisms determining divergent cold-tolerance of inbred maize lines.

  12. A genomic view of the NOD-like receptor family in teleost fish: Identification of a novel NLR subfamily in zebrafish

    USGS Publications Warehouse

    Laing, K.J.; Purcell, M.K.; Winton, J.R.; Hansen, J.D.

    2008-01-01

    Background. A large multigene family of NOD-like receptor (NLR) molecules have been described in mammals and implicated in immunity and apoptosis. Little information, however, exists concerning this gene family in non-mammalian taxa. This current study, therefore, provides an in-depth investigation of this gene family in lower vertebrates including extensive phylogenetic comparison of zebrafish NLRs with orthologs in tetrapods, and analysis of their tissue-specific expression. Results. Three distinct NLR subfamilies were identified by mining genome databases of various non-mammalian vertebrates; the first subfamily (NLR-A) resembles mammalian NODs, the second (NLR-B) resembles mammalian NALPs, while the third (NLR-C) appears to be unique to teleost fish. In zebrafish, NLR-A and NLR-B subfamilies contain five and six genes respectively. The third subfamily is large, containing several hundred NLR-C genes, many of which are predicted to encode a C-terminal B30.2 domain. This subfamily most likely evolved from a NOD3-like molecule. Gene predictions for zebrafish NLRs were verified using sequence derived from ESTs or direct sequencing of cDNA. Reverse-transcriptase (RT)-PCR analysis confirmed expression of representative genes from each subfamily in selected tissues. Conclusion. Our findings confirm the presence of multiple NLR gene orthologs, which form a large multigene family in teleostei. Although the functional significance of the three major NLR subfamilies is unclear, we speculate that conservation and abundance of NLR molecules in all teleostei genomes, reflects an essential role in cellular control, apoptosis or immunity throughout bony fish. ?? 2008 Laing et al; licensee BioMed Central Ltd.

  13. Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yaung, Stephanie J.; Deng, Luxue; Li, Ning

    Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less

  14. Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics

    DOE PAGES

    Yaung, Stephanie J.; Deng, Luxue; Li, Ning; ...

    2015-03-11

    Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less

  15. PPDB - A tool for investigation of plants physiology based on gene ontology.

    PubMed

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2014-09-02

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible on-line ( http://www.iitr.ernet.in/ajayshiv/ ) through a user friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multi-component complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  16. PPDB: A Tool for Investigation of Plants Physiology Based on Gene Ontology.

    PubMed

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2015-09-01

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible online ( http://www.iitr.ac.in/ajayshiv/ ) through a user-friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multicomponent complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  17. Exploring the Yeast Acetylome Using Functional Genomics

    PubMed Central

    Duffy, Supipi Kaluarachchi; Friesen, Helena; Baryshnikova, Anastasia; Lambert, Jean-Philippe; Chong, Yolanda T.; Figeys, Daniel; Andrews, Brenda

    2014-01-01

    SUMMARY Lysine acetylation is a dynamic posttranslational modification with a well-defined role in regulating histones. The impact of acetylation on other cellular functions remains relatively uncharacterized. We explored the budding yeast acetylome with a functional genomics approach, assessing the effects of gene overexpression in the absence of lysine deacetylases (KDACs). We generated a network of 463 synthetic dosage lethal (SDL) interactions involving class I and II KDACs, revealing many cellular pathways regulated by different KDACs. A biochemical survey of genes interacting with the KDAC RPD3 identified 72 proteins acetylated in vivo. In-depth analysis of one of these proteins, Swi4, revealed a role for acetylation in G1-specific gene expression. Acetylation of Swi4 regulates interaction with its partner Swi6, both components of the SBF transcription factor. This study expands our view of the yeast acetylome, demonstrates the utility of functional genomic screens for exploring enzymatic pathways, and provides functional information that can be mined for future studies. PMID:22579291

  18. Disease Comorbidity Network Guides the Detection of Molecular Evidence for the Link Between Colorectal Cancer and Obesity.

    PubMed

    Chen, Yang; Li, Li; Xu, Rong

    2015-01-01

    Epidemiological studies suggested that obesity increases the risk of colorectal cancer (CRC). The genetic connection between CRC and obesity is multifactorial and inconclusive. In this study, we hypothesize that the study of shared comorbid diseases between CRC and obesity can offer unique insights into common genetic basis of these two diseases. We constructed a comorbidity network based on mining health data for millions of patients. We developed a novel approach and extracted the diseases that play critical roles in connecting obesity and CRC in the comorbidity network. Our approach was able to prioritize metabolic syndrome and diabetes, which are known to be associated with obesity and CRC through insulin resistance pathways. Interestingly, we found that osteoporosis was highly associated with the connection between obesity and CRC. Through gene expression meta-analysis, we identified novel genes shared among CRC, obesity and osteoporosis. Literature evidences support that these genes may contribute in explaining the genetic overlaps between obesity and CRC.

  19. Systems Biology of Metabolic Regulation by Estrogen Receptor Signaling in Breast Cancer.

    PubMed

    Zhao, Yiru Chen; Madak Erdogan, Zeynep

    2016-03-17

    With the advent of the -omics approaches our understanding of the chronic diseases like cancer and metabolic syndrome has improved. However, effective mining of the information in the large-scale datasets that are obtained from gene expression microarrays, deep sequencing experiments or metabolic profiling is essential to uncover and then effectively target the critical regulators of diseased cell phenotypes. Estrogen Receptor α (ERα) is one of the master transcription factors regulating the gene programs that are important for estrogen responsive breast cancers. In order to understand to role of ERα signaling in breast cancer metabolism we utilized transcriptomic, cistromic and metabolomic data from MCF-7 cells treated with estradiol. In this report we described generation of samples for RNA-Seq, ChIP-Seq and metabolomics experiments and the integrative computational analysis of the obtained data. This approach is useful in delineating novel molecular mechanisms and gene regulatory circuits that are regulated by a particular transcription factor which impacts metabolism of normal or diseased cells.

  20. Semantic web for integrated network analysis in biomedicine.

    PubMed

    Chen, Huajun; Ding, Li; Wu, Zhaohui; Yu, Tong; Dhanapalan, Lavanya; Chen, Jake Y

    2009-03-01

    The Semantic Web technology enables integration of heterogeneous data on the World Wide Web by making the semantics of data explicit through formal ontologies. In this article, we survey the feasibility and state of the art of utilizing the Semantic Web technology to represent, integrate and analyze the knowledge in various biomedical networks. We introduce a new conceptual framework, semantic graph mining, to enable researchers to integrate graph mining with ontology reasoning in network data analysis. Through four case studies, we demonstrate how semantic graph mining can be applied to the analysis of disease-causal genes, Gene Ontology category cross-talks, drug efficacy analysis and herb-drug interactions analysis.

  1. Ammonia-Oligotrophic and Diazotrophic Heavy Metal-Resistant Serratia liquefaciens Strains from Pioneer Plants and Mine Tailings.

    PubMed

    Zelaya-Molina, Lily X; Hernández-Soto, Luis M; Guerra-Camacho, Jairo E; Monterrubio-López, Ricardo; Patiño-Siciliano, Alfredo; Villa-Tanaca, Lourdes; Hernández-Rodríguez, César

    2016-08-01

    Mine tailings are man-made environments characterized by low levels of organic carbon and assimilable nitrogen, as well as moderate concentrations of heavy metals. For the introduction of nitrogen into these environments, a key role is played by ammonia-oligotrophic/diazotrophic heavy metal-resistant guilds. In mine tailings from Zacatecas, Mexico, Serratia liquefaciens was the dominant heterotrophic culturable species isolated in N-free media from bulk mine tailings as well as the rhizosphere, roots, and aerial parts of pioneer plants. S. liquefaciens strains proved to be a meta-population with high intraspecific genetic diversity and a potential to respond to these extreme conditions. The phenotypic and genotypic features of these strains reveal the potential adaptation of S. liquefaciens to oligotrophic and nitrogen-limited mine tailings with high concentrations of heavy metals. These features include ammonia-oligotrophic growth, nitrogen fixation, siderophore and indoleacetic acid production, phosphate solubilization, biofilm formation, moderate tolerance to heavy metals under conditions of diverse nitrogen availability, and the presence of zntA, amtB, and nifH genes. The acetylene reduction assay suggests low nitrogen-fixing activity. The nifH gene was harbored in a plasmid of ∼60 kb and probably was acquired by a horizontal gene transfer event from Klebsiella variicola.

  2. Immunological network signatures of cancer progression and survival

    PubMed Central

    2011-01-01

    Background The immune contribution to cancer progression is complex and difficult to characterize. For example in tumors, immune gene expression is detected from the combination of normal, tumor and immune cells in the tumor microenvironment. Profiling the immune component of tumors may facilitate the characterization of the poorly understood roles immunity plays in cancer progression. However, the current approaches to analyze the immune component of a tumor rely on incomplete identification of immune factors. Methods To facilitate a more comprehensive approach, we created a ranked immunological relevance score for all human genes, developed using a novel strategy that combines text mining and information theory. We used this score to assign an immunological grade to gene expression profiles, and thereby quantify the immunological component of tumors. This immunological relevance score was benchmarked against existing manually curated immune resources as well as high-throughput studies. To further characterize immunological relevance for genes, the relevance score was charted against both the human interactome and cancer information, forming an expanded interactome landscape of tumor immunity. We applied this approach to expression profiles in melanomas, thus identifying and grading their immunological components, followed by identification of their associated protein interactions. Results The power of this strategy was demonstrated by the observation of early activation of the adaptive immune response and the diversity of the immune component during melanoma progression. Furthermore, the genome-wide immunological relevance score classified melanoma patient groups, whose immunological grade correlated with clinical features, such as immune phenotypes and survival. Conclusions The assignment of a ranked immunological relevance score to all human genes extends the content of existing immune gene resources and enriches our understanding of immune involvement in complex biological networks. The application of this approach to tumor immunity represents an automated systems strategy that quantifies the immunological component in complex disease. In so doing, it stratifies patients according to their immune profiles, which may lead to effective computational prognostic and clinical guides. PMID:21453479

  3. Large-scale bioinformatic analysis of the regulation of the disease resistance NBS gene family by microRNAs in Poaceae.

    PubMed

    Habachi-Houimli, Yosra; Khalfallah, Yosra; Makni, Hanem; Makni, Mohamed; Bouktila, Dhia

    2016-01-01

    In the present study, we have screened 71, 713, 525, 119 and 241 mature miRNA variants from Hordeum vulgare, Oryza sativa, Brachypodium distachyon, Triticum aestivum, and Sorghum bicolor, respectively, and classified them with respect to their conservation status and expression levels. These Poaceae non-redundant miRNA species (1,669) were distributed over a total of 625 MIR families, among which only 54 were conserved across two or more plant species, confirming the relatively recent evolutionary differentiation of miRNAs in grasses. On the other hand, we have used 257 H. vulgare, 286T. aestivum, 119 B. distachyon, 269 O. sativa, and 139 S. bicolor NBS domains, which were either mined directly from the annotated proteomes, or predicted from whole genome sequence assemblies. The hybridization potential between miRNAs and their putative NBS genes targets was analyzed, revealing that at least 454 NBS genes from all five Poaceae were potentially regulated by 265 distinct miRNA species, most of them expressed in leaves and predominantly co-expressed in additional tissues. Based on gene ontology, we could assign these probable miRNA target genes to 16 functional groups, among which three conferring resistance to bacteria (Rpm1, Xa1 and Rps2), and 13 groups of resistance to fungi (Rpp8,13, Rp3, Tsn1, Lr10, Rps1-k-1, Pm3, Rpg5, and MLA1,6,10,12,13). The results of the present analysis provide a large-scale platform for a better understanding of biological control strategies of disease resistance genes in Poaceae, and will serve as an important starting point for enhancing crop disease resistance improvement by means of transgenic lines with artificial miRNAs. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  4. Molecular Cloning and Characterization of G Alpha Proteins from the Western Tarnished Plant Bug, Lygus hesperus

    PubMed Central

    Hull, J. Joe; Wang, Meixian

    2014-01-01

    The Gα subunits of heterotrimeric G proteins play critical roles in the activation of diverse signal transduction cascades. However, the role of these genes in chemosensation remains to be fully elucidated. To initiate a comprehensive survey of signal transduction genes, we used homology-based cloning methods and transcriptome data mining to identity Gα subunits in the western tarnished plant bug (Lygus hesperus Knight). Among the nine sequences identified were single variants of the Gαi, Gαo, Gαs, and Gα12 subfamilies and five alternative splice variants of the Gαq subfamily. Sequence alignment and phylogenetic analyses of the putative L. hesperus Gα subunits support initial classifications and are consistent with established evolutionary relationships. End-point PCR-based profiling of the transcripts indicated head specific expression for LhGαq4, and largely ubiquitous expression, albeit at varying levels, for the other LhGα transcripts. All subfamilies were amplified from L. hesperus chemosensory tissues, suggesting potential roles in olfaction and/or gustation. Immunohistochemical staining of cultured insect cells transiently expressing recombinant His-tagged LhGαi, LhGαs, and LhGαq1 revealed plasma membrane targeting, suggesting the respective sequences encode functional G protein subunits. PMID:26463065

  5. Mechanism of development of ionocytes rich in vacuolar-type H+-ATPase in the skin of zebrafish larvae

    PubMed Central

    Esaki, Masahiro; Hoshijima, Kazuyuki; Nakamura, Nobuhiro; Munakata, Keijiro; Tanaka, Mikiko; Ookata, Kayoko; Asakawa, Kazuhide; Kawakami, Koichi; Wang, Weiyi; Weinberg, Eric S.; Hirose, Shigehisa

    2009-01-01

    Mitochondrion-rich cells (MRCs), or ionocytes, play a central role in aquatic species, maintaining body fluid ionic homeostasis by actively taking up or excreting ions. Since their first description in 1932 in eel gills, extensive morphological and physiological analyses have yielded important insights into ionocyte structure and function, but understanding the developmental pathway specifying these cells remains an ongoing challenge. We previously succeeded in identifying a key transcription factor, Foxi3a, in zebrafish larvae by database mining. In the present study, we analyzed a zebrafish mutant, quadro (quo), deficient in foxi1 gene expression and found that foxi1 is essential for development of an MRC subpopulation rich in vacuolar-type H+-ATPase (vH-MRC). foxi1 acts upstream of Delta-Notch signaling that determines sporadic distribution of vH-MRC and regulates foxi3a expression. Through gain- and loss-of-function assays and cell transplantation experiments, we further clarified that (1) the expression level of foxi3a is maintained by a positive feedback loop between foxi3a and its downstream gene gcm2 and (2) Foxi3a functions cell-autonomously in the specification of vH-MRC. These observations provide a better understanding of the differentiation and distribution of the vH-MRC subtype. PMID:19268451

  6. OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines.

    PubMed

    Chen, Wei-Hua; Lu, Guanting; Chen, Xiao; Zhao, Xing-Ming; Bork, Peer

    2017-01-04

    OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining.

    PubMed

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-04

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

    PubMed Central

    Mashiach, R.; Cohen, S.; Kedem, A.; Baron, A.; Zajicek, M.; Feldman, I.; Seidman, D.; Soriano, D.

    2018-01-01

    Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis. PMID:29750165

  9. How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database.

    PubMed

    Bouaziz, J; Mashiach, R; Cohen, S; Kedem, A; Baron, A; Zajicek, M; Feldman, I; Seidman, D; Soriano, D

    2018-01-01

    Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

  10. Endeavour update: a web resource for gene prioritization in multiple species

    PubMed Central

    Tranchevent, Léon-Charles; Barriot, Roland; Yu, Shi; Van Vooren, Steven; Van Loo, Peter; Coessens, Bert; De Moor, Bart; Aerts, Stein; Moreau, Yves

    2008-01-01

    Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis. PMID:18508807

  11. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. © The Author(s) 2016. Published by Oxford University Press.

  12. 78 FR 49292 - Northshore Mining Company, a Subsidiary of Cliffs Natural Resources, Including On-Site Leased...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-13

    ... Mining Company, a Subsidiary of Cliffs Natural Resources, Including On-Site Leased Workers From Vanhouse... Cliffs Natural Resources, Including On- Site Leased Workers From Vanhouse, Express Employment and Our... Natural Resources, including on-site leased workers from VanHouse and Express Employment, Silver Bay...

  13. COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets

    PubMed Central

    Lohmann, Ingrid

    2012-01-01

    In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209

  14. FREQUENT SUBGRAPH MINING OF PERSONALIZED SIGNALING PATHWAY NETWORKS GROUPS PATIENTS WITH FREQUENTLY DYSREGULATED DISEASE PATHWAYS AND PREDICTS PROGNOSIS.

    PubMed

    Durmaz, Arda; Henderson, Tim A D; Brubaker, Douglas; Bebek, Gurkan

    2017-01-01

    Large scale genomics studies have generated comprehensive molecular characterization of numerous cancer types. Subtypes for many tumor types have been established; however, these classifications are based on molecular characteristics of a small gene sets with limited power to detect dysregulation at the patient level. We hypothesize that frequent graph mining of pathways to gather pathways functionally relevant to tumors can characterize tumor types and provide opportunities for personalized therapies. In this study we present an integrative omics approach to group patients based on their altered pathway characteristics and show prognostic differences within breast cancer (p < 9:57E - 10) and glioblastoma multiforme (p < 0:05) patients. We were able validate this approach in secondary RNA-Seq datasets with p < 0:05 and p < 0:01 respectively. We also performed pathway enrichment analysis to further investigate the biological relevance of dysregulated pathways. We compared our approach with network-based classifier algorithms and showed that our unsupervised approach generates more robust and biologically relevant clustering whereas previous approaches failed to report specific functions for similar patient groups or classify patients into prognostic groups. These results could serve as a means to improve prognosis for future cancer patients, and to provide opportunities for improved treatment options and personalized interventions. The proposed novel graph mining approach is able to integrate PPI networks with gene expression in a biologically sound approach and cluster patients in to clinically distinct groups. We have utilized breast cancer and glioblastoma multiforme datasets from microarray and RNA-Seq platforms and identified disease mechanisms differentiating samples. Supplementary methods, figures, tables and code are available at https://github.com/bebeklab/dysprog.

  15. Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia.

    PubMed

    Covell, David G

    2015-01-01

    Developing reliable biomarkers of tumor cell drug sensitivity and resistance can guide hypothesis-driven basic science research and influence pre-therapy clinical decisions. A popular strategy for developing biomarkers uses characterizations of human tumor samples against a range of cancer drug responses that correlate with genomic change; developed largely from the efforts of the Cancer Cell Line Encyclopedia (CCLE) and Sanger Cancer Genome Project (CGP). The purpose of this study is to provide an independent analysis of this data that aims to vet existing and add novel perspectives to biomarker discoveries and applications. Existing and alternative data mining and statistical methods will be used to a) evaluate drug responses of compounds with similar mechanism of action (MOA), b) examine measures of gene expression (GE), copy number (CN) and mutation status (MUT) biomarkers, combined with gene set enrichment analysis (GSEA), for hypothesizing biological processes important for drug response, c) conduct global comparisons of GE, CN and MUT as biomarkers across all drugs screened in the CGP dataset, and d) assess the positive predictive power of CGP-derived GE biomarkers as predictors of drug response in CCLE tumor cells. The perspectives derived from individual and global examinations of GEs, MUTs and CNs confirm existing and reveal unique and shared roles for these biomarkers in tumor cell drug sensitivity and resistance. Applications of CGP-derived genomic biomarkers to predict the drug response of CCLE tumor cells finds a highly significant ROC, with a positive predictive power of 0.78. The results of this study expand the available data mining and analysis methods for genomic biomarker development and provide additional support for using biomarkers to guide hypothesis-driven basic science research and pre-therapy clinical decisions.

  16. Leaf-mining by Phyllonorycter blancardella reprograms the host-leaf transcriptome to modulate phytohormones associated with nutrient mobilization and plant defense.

    PubMed

    Zhang, Hui; Dugé de Bernonville, Thomas; Body, Mélanie; Glevarec, Gaëlle; Reichelt, Michael; Unsicker, Sybille; Bruneau, Maryline; Renou, Jean-Pierre; Huguet, Elisabeth; Dubreuil, Géraldine; Giron, David

    2016-01-01

    Phytohormones have long been hypothesized to play a key role in the interactions between plant-manipulating organisms and their host-plants such as insect-plant interactions that lead to gall or 'green-islands' induction. However, mechanistic understanding of how phytohormones operate in these plant reconfigurations is lacking due to limited information on the molecular and biochemical phytohormonal modulation following attack by plant-manipulating insects. In an attempt to fill this gap, the present study provides an extensive characterization of how the leaf-miner Phyllonorycter blancardella modulates the major phytohormones and the transcriptional activity of plant cells in leaves of Malus domestica. We show here, that cytokinins strongly accumulate in mined tissues despite a weak expression of plant cytokinin-related genes. Leaf-mining is also associated with enhanced biosynthesis of jasmonic acid precursors but not the active form, a weak alteration of the salicylic acid pathway and a clear inhibition of the abscisic acid pathway. Our study consolidates previous results suggesting that insects may produce and deliver cytokinins to the plant as a strategy to manipulate the physiology of the leaf to create a favorable nutritional environment. We also demonstrate that leaf-mining by P. blancardella leads to a strong reprogramming of the plant phytohormonal balance associated with increased nutrient mobilization, inhibition of leaf senescence and mitigation of plant direct and indirect defense. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Gravity persistent signal 1 reveals a novel cytochrome P450 involved in gravitropic signal transduction

    NASA Astrophysics Data System (ADS)

    Wyatt, Sarah

    Understanding gene expression that occurs during gravitopism is important for studying the processes that link the perception of gravity to the growth response. Arabidopsis plants with a mutation in the GRAVITY PERSISTENT SIGNAL (GPS)1 locus show a "no response" phenotype during gravistimulation experiments. Basepital auxin transport in gps1 mutant was unaffected by the mutation, but auxin was not laterally redistributed after gravistimulation. GPS1 encodes CYP705A22, a cytochrome P450 protein (P450) of unknown function. The wild type CYP705A22 gene was transformed into the gps1 mutant background and successfully rescued the mutant phenotype. Data mining of microarray data collected from gravistimulated root tips of Arabidopsis indicated that although CYP705A22 was not expressed in roots, a family member CYP705A5 was up-regulated within 3 minutes after gravistimulation. Expression profiling of CYP705A5, using real-time quantitative PCR, showed that CYP705A5 was up-regulated nearly five fold within minutes of gravity stimulation. And reporter gene fusions that link the CYP705A5 gene to the green fluorescent protein showed that CYP705A5 was expressed in the root zones of elongation and maturation. Computer modeling of the catalytic domain of CYP705A22 and CYP705A5 and in silico substrate docking simulations generated a list of 130 compounds that are potential substrates of the P450s. Many of the compounds are phenylpropanoid derivatives. Heterologous expression of CYP705A5 in baculovirus and Type 1 binding studies indicate the substrate of the P450 may be quercitin or myricetin. A mutation affecting CYP705A5 expression resulted in a delayed gravity response in roots. The mutant phenotype could be chemically complemented, and DPBA staining in the CYP705A5 mutant indicated a 1.5 fold accumulation of quercetin in mutant roots as compared to WT. These data, taken together, may indicate that we have identified a flavonoid pathway that regulates auxin distribution and thus is involved in gravitropic signal transduction. (Partially support by NSF: 0618506 to SEW)

  18. GeoChip-Based Analysis of the Functional Gene Diversity and Metabolic Potential of Microbial Communities in Acid Mine Drainage▿ †

    PubMed Central

    Xie, Jianping; He, Zhili; Liu, Xinxing; Liu, Xueduan; Van Nostrand, Joy D.; Deng, Ye; Wu, Liyou; Zhou, Jizhong; Qiu, Guanzhou

    2011-01-01

    Acid mine drainage (AMD) is an extreme environment, usually with low pH and high concentrations of metals. Although the phylogenetic diversity of AMD microbial communities has been examined extensively, little is known about their functional gene diversity and metabolic potential. In this study, a comprehensive functional gene array (GeoChip 2.0) was used to analyze the functional diversity, composition, structure, and metabolic potential of AMD microbial communities from three copper mines in China. GeoChip data indicated that these microbial communities were functionally diverse as measured by the number of genes detected, gene overlapping, unique genes, and various diversity indices. Almost all key functional gene categories targeted by GeoChip 2.0 were detected in the AMD microbial communities, including carbon fixation, carbon degradation, methane generation, nitrogen fixation, nitrification, denitrification, ammonification, nitrogen reduction, sulfur metabolism, metal resistance, and organic contaminant degradation, which suggested that the functional gene diversity was higher than was previously thought. Mantel test results indicated that AMD microbial communities are shaped largely by surrounding environmental factors (e.g., S, Mg, and Cu). Functional genes (e.g., narG and norB) and several key functional processes (e.g., methane generation, ammonification, denitrification, sulfite reduction, and organic contaminant degradation) were significantly (P < 0.10) correlated with environmental variables. This study presents an overview of functional gene diversity and the structure of AMD microbial communities and also provides insights into our understanding of metabolic potential in AMD ecosystems. PMID:21097602

  19. The influence of geomorphology on the role of women at artisanal and small-scale mine sites

    USGS Publications Warehouse

    Malpeli, Katherine C.; Chirico, Peter G.

    2013-01-01

    The geologic and geomorphic expressions of a mineral deposit determine its location, size, and accessibility, characteristics which in turn greatly influence the success of artisans mining the deposit. Despite this critical information, which can be garnered through studying the surficial physical expression of a deposit, the geologic and geomorphic sciences have been largely overlooked in artisanal mining-related research. This study demonstrates that a correlation exists between the roles of female miners at artisanal diamond and gold mining sites in western and central Africa and the physical expression of the deposits. Typically, women perform ore processing and ancillary roles at mine sites. On occasion, however, women participate in the extraction process itself. Women were found to participate in the extraction of ore only when a deposit had a thin overburden layer, thus rendering the mineralized ore more accessible. When deposits required a significant degree of manual labour to access the ore due to thick overburden layers, women were typically relegated to other roles. The identification of this link encourages the establishment of an alternative research avenue in which the physical and social sciences merge to better inform policymakers, so that the most appropriate artisanal mining assistance programs can be developed and implemented.

  20. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    PubMed

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Several immune escape patterns in non-Hodgkin's lymphomas

    PubMed Central

    Laurent, Camille; Charmpi, Konstantina; Gravelle, Pauline; Tosolini, Marie; Franchet, Camille; Ysebaert, Loïc; Brousset, Pierre; Bidaut, Alexandre; Ycart, Bernard; Fournié, Jean-Jacques

    2015-01-01

    Follicular Lymphomas (FL) and diffuse large B cell lymphomas (DLBCL) must evolve some immune escape strategy to develop from lymphoid organs, but their immune evasion pathways remain poorly characterized. We investigated this issue by transcriptome data mining and immunohistochemistry (IHC) of FL and DLBCL lymphoma biopsies. A set of genes involved in cancer immune-evasion pathways (Immune Escape Gene Set, IEGS) was defined and the distribution of the expression levels of these genes was compared in FL, DLBCL and normal B cell transcriptomes downloaded from the GEO database. The whole IEGS was significantly upregulated in all the lymphoma samples but not in B cells or other control tissues, as shown by the overexpression of the PD-1, PD-L1, PD-L2 and LAG3 genes. Tissue microarray immunostainings for PD-1, PD-L1, PD-L2 and LAG3 proteins on additional biopsies from 27 FL and 27 DLBCL patients confirmed the expression of these proteins. The immune infiltrates were more abundant in FL than DLBCL samples, and the microenvironment of FL comprised higher rates of PD-1+ lymphocytes. Further, DLBCL tumor cells comprised a higher proportion of PD-1+, PD-L1+, PD-L2+ and LAG3+ lymphoma cells than the FL tumor cells, confirming that DLBCL mount immune escape strategies distinct from FL. In addition, some cases of DLBCL had tumor cells co-expressing both PD-1, PD-L1 and PD-L2. Among the DLBCLs, the activated B cell (ABC) subtype comprised more PD-L1+ and PD-L2+ lymphoma cells than the GC subtype. Thus, we infer that FL and DLBCL evolved several pathways of immune escape. PMID:26405585

  2. In silico gene expression analysis reveals glycolysis and acetate anaplerosis in IDH1 wild-type glioma and lactate and glutamate anaplerosis in IDH1-mutated glioma.

    PubMed

    Khurshed, Mohammed; Molenaar, Remco J; Lenting, Krissie; Leenders, William P; van Noorden, Cornelis J F

    2017-07-25

    Hotspot mutations in isocitrate dehydrogenase 1 (IDH1) initiate low-grade glioma and secondary glioblastoma and induce a neomorphic activity that converts α-ketoglutarate (α-KG) to the oncometabolite D-2-hydroxyglutarate (D-2-HG). It causes metabolic rewiring that is not fully understood. We investigated the effects of IDH1 mutations (IDH1MUT) on expression of genes that encode for metabolic enzymes by data mining The Cancer Genome Atlas. We analyzed 112 IDH1 wild-type (IDH1WT) versus 399 IDH1MUT low-grade glioma and 157 IDH1WT versus 9 IDH1MUT glioblastoma samples. In both glioma types, IDH1WT was associated with high expression levels of genes encoding enzymes that are involved in glycolysis and acetate anaplerosis, whereas IDH1MUT glioma overexpress genes encoding enzymes that are involved in the oxidative tricarboxylic acid (TCA) cycle. In vitro, we observed that IDH1MUT cancer cells have a higher basal respiration compared to IDH1WT cancer cells and inhibition of the IDH1MUT shifts the metabolism by decreasing oxygen consumption and increasing glycolysis. Our findings indicate that IDH1WT glioma have a typical Warburg phenotype whereas in IDH1MUT glioma the TCA cycle, rather than glycolytic lactate production, is the predominant metabolic pathway. Our data further suggest that the TCA in IDH1MUT glioma is driven by lactate and glutamate anaplerosis to facilitate production of α-KG, and ultimately D-2-HG. This metabolic rewiring may be a basis for novel therapies for IDH1MUT and IDH1WT glioma.

  3. CEBS object model for systems biology data, SysBio-OM.

    PubMed

    Xirasagar, Sandhya; Gustafson, Scott; Merrick, B Alex; Tomer, Kenneth B; Stasiewicz, Stanley; Chan, Denny D; Yost, Kenneth J; Yates, John R; Sumner, Susan; Xiao, Nianqing; Waters, Michael D

    2004-09-01

    To promote a systems biology approach to understanding the biological effects of environmental stressors, the Chemical Effects in Biological Systems (CEBS) knowledge base is being developed to house data from multiple complex data streams in a systems friendly manner that will accommodate extensive querying from users. Unified data representation via a single object model will greatly aid in integrating data storage and management, and facilitate reuse of software to analyze and display data resulting from diverse differential expression or differential profile technologies. Data streams include, but are not limited to, gene expression analysis (transcriptomics), protein expression and protein-protein interaction analysis (proteomics) and changes in low molecular weight metabolite levels (metabolomics). To enable the integration of microarray gene expression, proteomics and metabolomics data in the CEBS system, we designed an object model, Systems Biology Object Model (SysBio-OM). The model is comprehensive and leverages other open source efforts, namely the MicroArray Gene Expression Object Model (MAGE-OM) and the Proteomics Experiment Data Repository (PEDRo) object model. SysBio-OM is designed by extending MAGE-OM to represent protein expression data elements (including those from PEDRo), protein-protein interaction and metabolomics data. SysBio-OM promotes the standardization of data representation and data quality by facilitating the capture of the minimum annotation required for an experiment. Such standardization refines the accuracy of data mining and interpretation. The open source SysBio-OM model, which can be implemented on varied computing platforms is presented here. A universal modeling language depiction of the entire SysBio-OM is available at http://cebs.niehs.nih.gov/SysBioOM/. The Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. The database and interface are being built to implement the model and will be available for public use at http://cebs.niehs.nih.gov.

  4. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

    PubMed

    Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I

    2015-02-21

    Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

  5. Analysis of the Genes Involved in Thiocyanate Oxidation during Growth in Continuous Culture of the Haloalkaliphilic Sulfur-Oxidizing Bacterium Thioalkalivibrio thiocyanoxidans ARh 2T Using Transcriptomics

    PubMed Central

    Balkema, Cherel; Sorokin, Dimitry Y.

    2017-01-01

    ABSTRACT Thiocyanate (N=C−S−) is a moderately toxic, inorganic sulfur compound. It occurs naturally as a by-product of the degradation of glucosinolate-containing plants and is produced industrially in a number of mining processes. Currently, two pathways for the primary degradation of thiocyanate in bacteria are recognized, the carbonyl sulfide pathway and the cyanate pathway, of which only the former has been fully characterized. Use of the cyanate pathway has been shown in only 10 strains of Thioalkalivibrio, a genus of obligately haloalkaliphilic sulfur-oxidizing Gammaproteobacteria found in soda lakes. So far, only the key enzyme in this reaction, thiocyanate dehydrogenase (TcDH), has been purified and studied. To gain a better understanding of the other genes involved in the cyanate pathway, we conducted a transcriptomics experiment comparing gene expression during the growth of Thioalkalivibrio thiocyanoxidans ARh 2T with thiosulfate with that during its growth with thiocyanate. Triplicate cultures were grown in continuous substrate-limited mode, followed by transcriptome sequencing (RNA-Seq) of the total mRNA. Differential expression analysis showed that a cluster of genes surrounding the gene for TcDH were strongly upregulated during growth with thiocyanate. This cluster includes genes for putative copper uptake systems (copCD, ABC-type transporters), a putative electron acceptor (fccAB), and a two-component regulatory system (histidine kinase and a σ54-responsive Fis family transcriptional regulator). Additionally, we observed the increased expression of RuBisCO and some carboxysome shell genes involved in inorganic carbon fixation, as well as of aprAB, genes involved in sulfite oxidation through the reverse sulfidogenesis pathway. IMPORTANCE Thiocyanate is a moderately toxic and chemically stable sulfur compound that is produced by both natural and industrial processes. Despite its significance as a pollutant, knowledge of the microbial degradation of thiocyanate is very limited. Therefore, investigation of thiocyanate oxidation in haloalkaliphiles such as the genus Thioalkalivibrio may lead to improved biotechnological applications in wastewater remediation. PMID:29285524

  6. QuadBase2: web server for multiplexed guanine quadruplex mining and visualization

    PubMed Central

    Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    DNA guanine quadruplexes or G4s are non-canonical DNA secondary structures which affect genomic processes like replication, transcription and recombination. G4s are computationally identified by specific nucleotide motifs which are also called putative G4 (PG4) motifs. Despite the general relevance of these structures, there is currently no tool available that can allow batch queries and genome-wide analysis of these motifs in a user-friendly interface. QuadBase2 (quadbase.igib.res.in) presents a completely reinvented web server version of previously published QuadBase database. QuadBase2 enables users to mine PG4 motifs in up to 178 eukaryotes through the EuQuad module. This module interfaces with Ensembl Compara database, to allow users mine PG4 motifs in the orthologues of genes of interest across eukaryotes. PG4 motifs can be mined across genes and their promoter sequences in 1719 prokaryotes through ProQuad module. This module includes a feature that allows genome-wide mining of PG4 motifs and their visualization as circular histograms. TetraplexFinder, the module for mining PG4 motifs in user-provided sequences is now capable of handling up to 20 MB of data. QuadBase2 is a comprehensive PG4 motif mining tool that further expands the configurations and algorithms for mining PG4 motifs in a user-friendly way. PMID:27185890

  7. Interleukin-27 is a novel candidate diagnostic biomarker for bacterial infection in critically ill children

    PubMed Central

    2012-01-01

    Introduction Differentiating between sterile inflammation and bacterial infection in critically ill patients with fever and other signs of the systemic inflammatory response syndrome (SIRS) remains a clinical challenge. The objective of our study was to mine an existing genome-wide expression database for the discovery of candidate diagnostic biomarkers to predict the presence of bacterial infection in critically ill children. Methods Genome-wide expression data were compared between patients with SIRS having negative bacterial cultures (n = 21) and patients with sepsis having positive bacterial cultures (n = 60). Differentially expressed genes were subjected to a leave-one-out cross-validation (LOOCV) procedure to predict SIRS or sepsis classes. Serum concentrations of interleukin-27 (IL-27) and procalcitonin (PCT) were compared between 101 patients with SIRS and 130 patients with sepsis. All data represent the first 24 hours of meeting criteria for either SIRS or sepsis. Results Two hundred twenty one gene probes were differentially regulated between patients with SIRS and patients with sepsis. The LOOCV procedure correctly predicted 86% of the SIRS and sepsis classes, and Epstein-Barr virus-induced gene 3 (EBI3) had the highest predictive strength. Computer-assisted image analyses of gene-expression mosaics were able to predict infection with a specificity of 90% and a positive predictive value of 94%. Because EBI3 is a subunit of the heterodimeric cytokine, IL-27, we tested the ability of serum IL-27 protein concentrations to predict infection. At a cut-point value of ≥5 ng/ml, serum IL-27 protein concentrations predicted infection with a specificity and a positive predictive value of >90%, and the overall performance of IL-27 was generally better than that of PCT. A decision tree combining IL-27 and PCT improved overall predictive capacity compared with that of either biomarker alone. Conclusions Genome-wide expression analysis has provided the foundation for the identification of IL-27 as a novel candidate diagnostic biomarker for predicting bacterial infection in critically ill children. Additional studies will be required to test further the diagnostic performance of IL-27. The microarray data reported in this article have been deposited in the Gene Expression Omnibus under accession number GSE4607. PMID:23107287

  8. Two COWP-like cysteine rich proteins from Eimeria nieschulzi (coccidia, apicomplexa) are expressed during sporulation and involved in the sporocyst wall formation.

    PubMed

    Jonscher, Ernst; Erdbeer, Alexander; Günther, Marie; Kurth, Michael

    2015-07-25

    The family of cysteine rich proteins of the oocyst wall (COWPs) originally described in Cryptosporidium can also be found in Toxoplasma gondii (TgOWPs) localised to the oocyst wall as well. Genome sequence analysis of Eimeria suggests that these proteins may also exist in this genus and led us to the assumption that these proteins may also play a role in oocyst wall formation. In this study, COWP-like encoding sequences had been identified in Eimeria nieschulzi. The predicted gene sequences were subsequently utilized in reporter gene assays to observe time of expression and localisation of the reporter protein in vivo. Both investigated proteins, EnOWP2 and EnOWP6, were expressed during sporulation. The EnOWP2-promoter driven mCherry was found in the cytoplasm and the EnOWP2, respectively EnOWP6, fused to mCherry was initially observed in the extracytoplasmatic space between sporoblast and oocyst wall. This, so far unnamed compartment was designated as circumplasm. Later, the mCherry reporter co-localised with the sporocyst wall of the sporulated oocysts. This observation had been confirmed by confocal microscopy, excystation experiments and IFA. Transcript analysis revealed the intron-exon structure of these genes and confirmed the expression of EnOWP2 and EnOWP6 during sporogony. Our results allow us to assume a role, of both investigated EnOWP proteins, in the sporocyst wall formation of E. nieschulzi. Data mining and sequence comparisons to T. gondii and other Eimeria species allow us to hypothesise a conserved process within the coccidia. A role in oocyst wall formation had not been observed in E. nieschulzi.

  9. Identification, cloning and characterization of R2R3-MYB gene family in canola (Brassica napus L.) identify a novel member modulating ROS accumulation and hypersensitive-like cell death

    PubMed Central

    Chen, Bisi; Niu, Fangfang; Liu, Wu-Zhen; Yang, Bo; Zhang, Jingxiao; Ma, Jieyu; Cheng, Hao; Han, Feng; Jiang, Yuan-Qing

    2016-01-01

    The R2R3-MYB proteins comprise one of the largest families of transcription factors in plants. Although genome-wide analysis of this family has been carried out in some plant species, little is known about R2R3-MYB genes in canola (Brassica napus L.). In this study, we have identified 76 R2R3-MYB genes in the canola genome through mining of expressed sequence tags (ESTs). The cDNA sequences of 44 MYB genes were successfully cloned. The transcriptional activities of BnaMYB proteins encoded by these genes were assayed in yeast. The subcellular localizations of representative R2R3-MYB proteins were investigated through GFP fusion. Besides, the transcript abundance level analysis during abiotic conditions and ABA treatment identified a group of R2R3-MYB genes that responded to one or more treatments. Furthermore, we identified a previously functionally unknown MYB gene-BnaMYB78, which modulates reactive oxygen species (ROS)-dependent cell death in Nicotiana benthamiana, through regulating the transcription of a few ROS- and defence-related genes. Taken together, this study has provided a solid foundation for understanding the roles and regulatory mechanism of canola R2R3-MYB genes. PMID:26800702

  10. The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis.

    PubMed

    Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves

    2013-03-01

    Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.

  11. Dense module enumeration in biological networks

    NASA Astrophysics Data System (ADS)

    Tsuda, Koji; Georgii, Elisabeth

    2009-12-01

    Analysis of large networks is a central topic in various research fields including biology, sociology, and web mining. Detection of dense modules (a.k.a. clusters) is an important step to analyze the networks. Though numerous methods have been proposed to this aim, they often lack mathematical rigorousness. Namely, there is no guarantee that all dense modules are detected. Here, we present a novel reverse-search-based method for enumerating all dense modules. Furthermore, constraints from additional data sources such as gene expression profiles or customer profiles can be integrated, so that we can systematically detect dense modules with interesting profiles. We report successful applications in human protein interaction network analyses.

  12. Novel insights into the lipidome of glioblastoma cells based on a combined PLSR and DD-HDS computational analysis

    NASA Astrophysics Data System (ADS)

    Lespinats, S.; Meyer-Bäse, Anke; He, Huan; Marshall, Alan G.; Conrad, Charles A.; Emmett, Mark R.

    2009-05-01

    Partial Least Square Regression (PLSR) and Data-Driven High Dimensional Scaling (DD-HDS) are employed for the prediction and the visualization of changes in polar lipid expression induced by different combinations of wild-type (wt) p53 gene therapy and SN38 chemotherapy of U87 MG glioblastoma cells. A very detailed analysis of the gangliosides reveals that certain gangliosides of GM3 or GD1-type have unique properties not shared by the others. In summary, this preliminary work shows that data mining techniques are able to determine the modulation of gangliosides by different treatment combinations.

  13. Contemporary Network Proteomics and Its Requirements

    PubMed Central

    Goh, Wilson Wen Bin; Wong, Limsoon; Sng, Judy Chia Ghee

    2013-01-01

    The integration of networks with genomics (network genomics) is a familiar field. Conventional network analysis takes advantage of the larger coverage and relative stability of gene expression measurements. Network proteomics on the other hand has to develop further on two critical factors: (1) expanded data coverage and consistency, and (2) suitable reference network libraries, and data mining from them. Concerning (1) we discuss several contemporary themes that can improve data quality, which in turn will boost the outcome of downstream network analysis. For (2), we focus on network analysis developments, specifically, the need for context-specific networks and essential considerations for localized network analysis. PMID:24833333

  14. Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development

    PubMed Central

    2012-01-01

    Background Since processes in well-known model organisms have specific features different from those in Bos taurus, the organism under study, a good way to describe gene regulation in ruminant embryos would be a species-specific consideration of closely related species to cattle, sheep and pig. However, as highlighted by a recent report, gene dictionaries in pig are smaller than in cattle, bringing a risk to reduce the gene resources to be mined (and so for sheep dictionaries). Bioinformatics approaches that allow an integration of available information on gene function in model organisms, taking into account their specificity, are thus needed. Besides these closely related and biologically relevant species, there is indeed much more knowledge of (i) trophoblast proliferation and differentiation or (ii) embryogenesis in human and mouse species, which provides opportunities for reconstructing proliferation and/or differentiation processes in other mammalian embryos, including ruminants. The necessary knowledge can be obtained partly from (i) stem cell or cancer research to supply useful information on molecular agents or molecular interactions at work in cell proliferation and (ii) mouse embryogenesis to supply useful information on embryo differentiation. However, the total number of publications for all these topics and species is great and their manual processing would be tedious and time consuming. This is why we used text mining for automated text analysis and automated knowledge extraction. To evaluate the quality of this “mining”, we took advantage of studies that reported gene expression profiles during the elongation of bovine embryos and defined a list of transcription factors (or TF, n = 64) that we used as biological “gold standard”. When successful, the “mining” approach would identify them all, as well as novel ones. Methods To gain knowledge on molecular-genetic regulations in a non model organism, we offer an approach based on literature-mining and score arrangement of data from model organisms. This approach was applied to identify novel transcription factors during bovine blastocyst elongation, a process that is not observed in rodents and primates. As a result, searching through human and mouse corpuses, we identified numerous bovine homologs, among which 11 to 14% of transcription factors including the gold standard TF as well as novel TF potentially important to gene regulation in ruminant embryo development. The scripts of the workflow are written in Perl and available on demand. They require data input coming from all various databases for any kind of biological issue once the data has been prepared according to keywords for the studied topic and species; we can provide data sample to illustrate the use and functionality of the workflow. Results To do so, we created a workflow that allowed the pipeline processing of literature data and biological data, extracted from Web of Science (WoS) or PubMed but also from Gene Expression Omnibus (GEO), Gene Ontology (GO), Uniprot, HomoloGene, TcoF-DB and TFe (TF encyclopedia). First, the human and mouse homologs of the bovine proteins were selected, filtered by text corpora and arranged by score functions. The score functions were based on the gene name frequencies in corpora. Then, transcription factors were identified using TcoF-DB and double-checked using TFe to characterise TF groups and families. Thus, among a search space of 18,670 bovine homologs, 489 were identified as transcription factors. Among them, 243 were absent from the high-throughput data available at the time of the study. They thus stand so far for putative TF acting during bovine embryo elongation, but might be retrieved from a recent RNA sequencing dataset (Mamo et al. , 2012). Beyond the 246 TF that appeared expressed in bovine elongating tissues, we restricted our interpretation to those occurring within a list of 50 top-ranked genes. Among the transcription factors identified therein, half belonged to the gold standard (ASCL2, c-FOS, ETS2, GATA3, HAND1) and half did not (ESR1, HES1, ID2, NANOG, PHB2, TP53, STAT3). Conclusions A workflow providing search for transcription factors acting in bovine elongation was developed. The model assumed that proteins sharing the same protein domains in closely related species had the same protein functionalities, even if they were differently regulated among species or involved in somewhat different pathways. Under this assumption, we merged the information on different mammalian species from different databases (literature and biology) and proposed 489 TF as potential participants of embryo proliferation and differentiation, with (i) a recall of 95% with regard to a biological gold standard defined in 2011 and (ii) an extension of more than 3 times the gold standard of TF detected so far in elongating tissues. The working capacity of the workflow was supported by the manual expertise of the biologists on the results. The workflow can serve as a new kind of bioinformatics tool to work on fused data sources and can thus be useful in studies of a wide range of biological processes. PMID:22931563

  15. Identification of novel isoprene synthases through genome mining and expression in Escherichia coli.

    PubMed

    Ilmén, Marja; Oja, Merja; Huuskonen, Anne; Lee, Sangmin; Ruohonen, Laura; Jung, Simon

    2015-09-01

    Isoprene is a naturally produced hydrocarbon emitted into the atmosphere by green plants. It is also a constituent of synthetic rubber and a potential biofuel. Microbial production of isoprene can become a sustainable alternative to the prevailing chemical production of isoprene from petroleum. In this work, sequence homology searches were conducted to find novel isoprene synthases. Candidate sequences were functionally expressed in Escherichia coli and the desired enzymes were identified based on an isoprene production assay. The activity of three enzymes was shown for the first time: expression of the candidate genes from Ipomoea batatas, Mangifera indica, and Elaeocarpus photiniifolius resulted in isoprene formation. The Ipomoea batatas isoprene synthase produced the highest amounts of isoprene in all experiments, exceeding the isoprene levels obtained by the previously known Populus alba and Pueraria montana isoprene synthases that were studied in parallel as controls. Copyright © 2015 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.

  16. Integrated Genomic and Epigenomic Analysis of Breast Cancer Brain Metastasis

    PubMed Central

    Salhia, Bodour; Kiefer, Jeff; Ross, Julianna T. D.; Metapally, Raghu; Martinez, Rae Anne; Johnson, Kyle N.; DiPerna, Danielle M.; Paquette, Kimberly M.; Jung, Sungwon; Nasser, Sara; Wallstrom, Garrick; Tembe, Waibhav; Baker, Angela; Carpten, John; Resau, Jim; Ryken, Timothy; Sibenaller, Zita; Petricoin, Emanuel F.; Liotta, Lance A.; Ramanathan, Ramesh K.; Berens, Michael E.; Tran, Nhan L.

    2014-01-01

    The brain is a common site of metastatic disease in patients with breast cancer, which has few therapeutic options and dismal outcomes. The purpose of our study was to identify common and rare events that underlie breast cancer brain metastasis. We performed deep genomic profiling, which integrated gene copy number, gene expression and DNA methylation datasets on a collection of breast brain metastases. We identified frequent large chromosomal gains in 1q, 5p, 8q, 11q, and 20q and frequent broad-level deletions involving 8p, 17p, 21p and Xq. Frequently amplified and overexpressed genes included ATAD2, BRAF, DERL1, DNMTRB and NEK2A. The ATM, CRYAB and HSPB2 genes were commonly deleted and underexpressed. Knowledge mining revealed enrichment in cell cycle and G2/M transition pathways, which contained AURKA, AURKB and FOXM1. Using the PAM50 breast cancer intrinsic classifier, Luminal B, Her2+/ER negative, and basal-like tumors were identified as the most commonly represented breast cancer subtypes in our brain metastasis cohort. While overall methylation levels were increased in breast cancer brain metastasis, basal-like brain metastases were associated with significantly lower levels of methylation. Integrating DNA methylation data with gene expression revealed defects in cell migration and adhesion due to hypermethylation and downregulation of PENK, EDN3, and ITGAM. Hypomethylation and upregulation of KRT8 likely affects adhesion and permeability. Genomic and epigenomic profiling of breast brain metastasis has provided insight into the somatic events underlying this disease, which have potential in forming the basis of future therapeutic strategies. PMID:24489661

  17. Comparison of tumor related signaling pathways with known compounds to determine potential agents for lung adenocarcinoma.

    PubMed

    Xu, Song; Liu, Renwang; Da, Yurong

    2018-06-05

    This study compared tumor-related signaling pathways with known compounds to determine potential agents for lung adenocarcinoma (LUAD) treatment. Kyoto Encyclopedia of Genes and Genomes signaling pathway analyses were performed based on LUAD differentially expressed genes from The Cancer Genome Atlas (TCGA) project and genotype-tissue expression controls. These results were compared to various known compounds using the Connectivity Mapping dataset. The clinical significance of the hub genes identified by overlapping pathway enrichment analysis was further investigated using data mining from multiple sources. A drug-pathway network for LUAD was constructed, and molecular docking was carried out. After the integration of 57 LUAD-related pathways and 35 pathways affected by small molecules, five overlapping pathways were revealed. Among these five pathways, the p53 signaling pathway was the most significant, with CCNB1, CCNB2, CDK1, CDKN2A, and CHEK1 being identified as hub genes. The p53 signaling pathway is implicated as a risk factor for LUAD tumorigenesis and survival. A total of 88 molecules significantly inhibiting the five LUAD-related oncogenic pathways were involved in the LUAD drug-pathway network. Daunorubicin, mycophenolic acid, and pyrvinium could potentially target the hub gene CHEK1 directly. Our study highlights the critical pathways that should be targeted in the search for potential LUAD treatments, most importantly, the p53 signaling pathway. Some compounds, such as ciclopirox and AG-028671, may have potential roles for LUAD treatment but require further experimental verification. © 2018 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

  18. RNAi-mediated disruption of neuropeptide genes, nlp-3 and nlp-12, cause multiple behavioral defects in Meloidogyne incognita.

    PubMed

    Dash, Manoranjan; Dutta, Tushar K; Phani, Victor; Papolu, Pradeep K; Shivakumara, Tagginahalli N; Rao, Uma

    2017-08-26

    Owing to the current deficiencies in chemical control options and unavailability of novel management strategies, root-knot nematode (M. incognita) infections remain widespread with significant socio-economic impacts. Helminth nervous systems are peptide-rich and appear to be putative drug targets that could be exploited by antihelmintic chemotherapy. Herein, to characterize the novel peptidergic neurotransmitters, in silico mining of M. incognita genomic and transciptomic datasets revealed the presence of 16 neuropeptide-like protein (nlp) genes with structural hallmarks of neuropeptide preproproteins; among which 13 nlps were PCR-amplified and sequenced. Two key nlp genes (Mi-nlp-3 and Mi-nlp-12) were localized to the basal bulb and tail region of nematode body via in situ hybridization assay. Mi-nlp-3 and Mi-nlp-12 were greatly expressed (in qRT-PCR assay) in the pre-parasitic juveniles and adult females, suggesting the association of these genes in host recognition, development and reproduction of M. incognita. In vitro knockdown of Mi-nlp-3 and Mi-nlp-12 via RNAi demonstrated the significant reduction in attraction and penetration of M. incognita in tomato root in Pluronic gel medium. A pronounced perturbation in development and reproduction of NLP-silenced worms was also documented in adzuki beans in CYG growth pouches. The deleterious phenotypes obtained due to NLP knockdown suggests that transgenic plants engineered to express RNA constructs targeting nlp genes may emerge as an environmentally viable option to manage nematode problems in crop plants. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Phylogenetic analysis of bacterial isolates from man-made high-pH, high-salt environments and identification of gene-cassette-associated open reading frames.

    PubMed

    Ghauri, Muhammad A; Khalid, Ahmad M; Grant, Susan; Grant, William D; Heaphy, Shaun

    2006-06-01

    Environmental samples were collected from high-pH sites in Pakistan, including a uranium heap set up for carbonate leaching, the lime unit of a tannery, and the Khewra salt mine. Another sample was collected from a hot spring on the shore of the soda lake, Magadi, in Kenya. Microbial cultures were enriched from Pakistani samples. Phylogenetic analysis of isolates was carried out by sequencing 16S rRNA genes. Genomic DNA was amplified by polymerase chain reaction using integron gene-cassette-specific primers. Different gene-cassette-linked genes were recovered from the cultured strains related to Halomonas magadiensis, Virgibacillus halodenitrificans, and Yania flava and from the uncultured environmental DNA sample. The usefulness of this technique as a tool for gene mining is indicated.

  20. An open-source framework for large-scale, flexible evaluation of biomedical text mining systems.

    PubMed

    Baumgartner, William A; Cohen, K Bretonnel; Hunter, Lawrence

    2008-01-29

    Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net.

  1. An open-source framework for large-scale, flexible evaluation of biomedical text mining systems

    PubMed Central

    Baumgartner, William A; Cohen, K Bretonnel; Hunter, Lawrence

    2008-01-01

    Background Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. Results Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. Conclusion The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net. PMID:18230184

  2. CARSVM: a class association rule-based classification framework and its application to gene expression data.

    PubMed

    Kianmehr, Keivan; Alhajj, Reda

    2008-09-01

    In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.

  3. A new genome of Acidithiobacillus thiooxidans provides insights into adaptation to a bioleaching environment.

    PubMed

    Travisany, Dante; Cortés, María Paz; Latorre, Mauricio; Di Genova, Alex; Budinich, Marko; Bobadilla-Fazzini, Roberto A; Parada, Pilar; González, Mauricio; Maass, Alejandro

    2014-11-01

    Acidithiobacillus thiooxidans is a sulfur oxidizing acidophilic bacterium found in many sulfur-rich environments. It is particularly interesting due to its role in bioleaching of sulphide minerals. In this work, we report the genome sequence of At. thiooxidans Licanantay, the first strain from a copper mine to be sequenced and currently used in bioleaching industrial processes. Through comparative genomic analysis with two other At. thiooxidans non-metal mining strains (ATCC 19377 and A01) we determined that these strains share a large core genome of 2109 coding sequences and a high average nucleotide identity over 98%. Nevertheless, the presence of 841 strain-specific genes (absent in other At. thiooxidans strains) suggests a particular adaptation of Licanantay to its specific biomining environment. Among this group, we highlight genes encoding for proteins involved in heavy metal tolerance, mineral cell attachment and cysteine biosynthesis. Several of these genes were located near genetic motility genes (e.g. transposases and integrases) in genomic regions of over 10 kbp absent in the other strains, suggesting the presence of genomic islands in the Licanantay genome probably produced by horizontal gene transfer in mining environments. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  4. De novo transcriptome sequencing and discovery of genes related to copper tolerance in Paeonia ostii.

    PubMed

    Wang, Yanjie; Dong, Chunlan; Xue, Zeyun; Jin, Qijiang; Xu, Yingchun

    2016-01-15

    Paeonia ostii, an important ornamental and medicinal plant, grows normally on copper (Cu) mines with widespread Cu contamination of soils, and it has the ability to lower Cu contents in the Cu-contaminated soils. However, very little molecular information concerned with Cu resistance of P. ostii is available. In this study, high-throughput de novo transcriptome sequencing was carried out for P. ostii with and without Cu treatment using Illumina HiSeq 2000 platform. A total of 77,704 All-unigenes were obtained with a mean length of 710 bp. Of these unigenes, 47,461 were annotated with public databases based on sequence similarities. Comparative transcript profiling allowed the discovery of 4324 differentially expressed genes (DEGs), with 2207 up-regulated and 2117 down-regulated unigenes in Cu-treated library as compared to the control counterpart. Based on these DEGs, Gene Ontology (GO) enrichment analysis indicated Cu stress-relevant terms, such as 'membrane' and 'antioxidant activity'. Meanwhile, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis uncovered some important pathways, including 'biosynthesis of secondary metabolites' and 'metabolic pathways'. In addition, expression patterns of 12 selected DEGs derived from quantitative real-time polymerase chain reaction (qRT-PCR) were consistent with their transcript abundance changes obtained by transcriptomic analyses, suggesting that all the 12 genes were authentically involved in Cu tolerance in P. ostii. This is the first report to identify genes related to Cu stress responses in P. ostii, which could offer valuable information on the molecular mechanisms of Cu resistance, and provide a basis for further genomics research on this and related ornamental species for phytoremediation. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Genome-wide identification and characterization of NB-ARC resistant genes in wheat (Triticum aestivum L.) and their expression during leaf rust infection.

    PubMed

    Chandra, Saket; Kazmi, Andaleeb Z; Ahmed, Zainab; Roychowdhury, Gargi; Kumari, Veena; Kumar, Manish; Mukhopadhyay, Kunal

    2017-07-01

    NB-ARC domain-containing resistance genes from the wheat genome were identified, characterized and localized on chromosome arms that displayed differential yet positive response during incompatible and compatible leaf rust interactions. Wheat (Triticum aestivum L.) is an important cereal crop; however, its production is affected severely by numerous diseases including rusts. An efficient, cost-effective and ecologically viable approach to control pathogens is through host resistance. In wheat, high numbers of resistance loci are present but only few have been identified and cloned. A comprehensive analysis of the NB-ARC-containing genes in complete wheat genome was accomplished in this study. Complete NB-ARC encoding genes were mined from the Ensembl Plants database to predict 604 NB-ARC containing sequences using the HMM approach. Genome-wide analysis of orthologous clusters in the NB-ARC-containing sequences of wheat and other members of the Poaceae family revealed maximum homology with Oryza sativa indica and Brachypodium distachyon. The identification of overlap between orthologous clusters enabled the elucidation of the function and evolution of resistance proteins. The distributions of the NB-ARC domain-containing sequences were found to be balanced among the three wheat sub-genomes. Wheat chromosome arms 4AL and 7BL had the most NB-ARC domain-containing contigs. The spatio-temporal expression profiling studies exemplified the positive role of these genes in resistant and susceptible wheat plants during incompatible and compatible interaction in response to the leaf rust pathogen Puccinia triticina. Two NB-ARC domain-containing sequences were modelled in silico, cloned and sequenced to analyze their fine structures. The data obtained in this study will augment isolation, characterization and application NB-ARC resistance genes in marker-assisted selection based breeding programs for improving rust resistance in wheat.

  6. An Outbreak of Lymphocutaneous Sporotrichosis among Mine-Workers in South Africa

    PubMed Central

    Govender, Nelesh P.; Maphanga, Tsidiso G.; Zulu, Thokozile G.; Patel, Jaymati; Walaza, Sibongile; Jacobs, Charlene; Ebonwu, Joy I.; Ntuli, Sindile; Naicker, Serisha D.; Thomas, Juno

    2015-01-01

    Background The largest outbreak of sporotrichosis occurred between 1938 and 1947 in the gold mines of Witwatersrand in South Africa. Here, we describe an outbreak of lymphocutaneous sporotrichosis that was investigated in a South African gold mine in 2011. Methodology Employees working at a reopened section of the mine were recruited for a descriptive cross-sectional study. Informed consent was sought for interview, clinical examination and medical record review. Specimens were collected from participants with active or partially-healed lymphocutaneous lesions. Environmental samples were collected from underground mine levels. Sporothrix isolates were identified by sequencing of the internal transcribed spacer region of the ribosomal gene and the nuclear calmodulin gene. Principal Findings Of 87 male miners, 81 (93%) were interviewed and examined, of whom 29 (36%) had skin lesions; specimens were collected from 17 (59%). Sporotrichosis was laboratory-confirmed among 10 patients and seven had clinically-compatible lesions. Of 42 miners with known HIV status, 11 (26%) were HIV-infected. No cases of disseminated disease were detected. Participants with ≤3 years’ mining experience had a four times greater odds of developing sporotrichosis than those who had been employed for >3 years (adjusted OR 4.0, 95% CI 1.2–13.1). Isolates from 8 patients were identified as Sporothrix schenckii sensu stricto by calmodulin gene sequencing while environmental isolates were identified as Sporothrix mexicana. Conclusions/Significance S. schenckii sensu stricto was identified as the causative pathogen. Although genetically distinct species were isolated from clinical and environmental sources, it is likely that the source was contaminated soil and untreated wood underground. No cases occurred following recommendations to close sections of the mine, treat timber and encourage consistent use of personal protective equipment. Sporotrichosis is a potentially re-emerging disease where traditional, rather than heavily mechanised, mining techniques are used. Surveillance should be instituted at sentinel locations. PMID:26407300

  7. Computational functional genomics-based approaches in analgesic drug discovery and repurposing.

    PubMed

    Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn

    2018-06-01

    Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.

  8. Construction, database integration, and application of an Oenothera EST library.

    PubMed

    Mrácek, Jaroslav; Greiner, Stephan; Cho, Won Kyong; Rauwolf, Uwe; Braun, Martha; Umate, Pavan; Altstätter, Johannes; Stoppel, Rhea; Mlcochová, Lada; Silber, Martina V; Volz, Stefanie M; White, Sarah; Selmeier, Renate; Rudd, Stephen; Herrmann, Reinhold G; Meurer, Jörg

    2006-09-01

    Coevolution of cellular genetic compartments is a fundamental aspect in eukaryotic genome evolution that becomes apparent in serious developmental disturbances after interspecific organelle exchanges. The genus Oenothera represents a unique, at present the only available, resource to study the role of the compartmentalized plant genome in diversification of populations and speciation processes. An integrated approach involving cDNA cloning, EST sequencing, and bioinformatic data mining was chosen using Oenothera elata with the genetic constitution nuclear genome AA with plastome type I. The Gene Ontology system grouped 1621 unique gene products into 17 different functional categories. Application of arrays generated from a selected fraction of ESTs revealed significantly differing expression profiles among closely related Oenothera species possessing the potential to generate fertile and incompatible plastid/nuclear hybrids (hybrid bleaching). Furthermore, the EST library provides a valuable source of PCR-based polymorphic molecular markers that are instrumental for genotyping and molecular mapping approaches.

  9. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources.

    PubMed

    Rebholz-Schuhmann, Dietrich; Grabmüller, Christoph; Kavaliauskas, Silvestras; Croset, Samuel; Woollard, Peter; Backofen, Rolf; Filsell, Wendy; Clark, Dominic

    2014-07-01

    In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker. Copyright © 2013. Published by Elsevier Ltd.

  10. Transcriptome mining: Multigene panel to test delousing drug response in the sea louse Caligus rogercresseyi.

    PubMed

    Valenzuela-Muñoz, V; Gallardo-Escárate, C

    2016-02-01

    Controlling infestations of copepodid ectoparasites in the salmon industry is increasingly problematic given higher instances of drug resistance or loss of sensitivity. Despite the importance of this issue, the molecular mechanisms and genes implicated in resistance/susceptibility are only scarcely understood. The objective of the present study was to identify and evaluate the expression levels of candidate genes associated with delousing drug response in the sea louse Caligus rogercresseyi. From RNA-seq data obtained for adult male and female sea lice, 62.48 M reads were assembled in 70,349 high-quality contigs. BLASTX analysis against UniprotKB/Swiss-Prot and the ESTs available for crustaceans in the NCBI database identified 870 transcripts previously related to genes associated with delousing drug response. Furthermore, 14 candidate genes were validated through RT-qPCR and were evaluated with deltamethrin and azamethiphos bioassays. The results evidenced an overregulation of genes involved in ion transport in salmon lice treated with deltamethrin, while those treated with azamethiphos evidenced an overregulation of genes such as cytochrome P450, Carboxylesterase, and acetylcholine receptors. The present study provides a multigene panel to test delousing drug response to pyrethroids and organophosphates in a highly prevalent pathogen of the Chilean salmon industry. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Using molecular functional networks to manifest connections between obesity and obesity-related diseases

    PubMed Central

    Yang, Jialiang; Qiu, Jing; Wang, Kejing; Zhu, Lijuan; Fan, Jingjing; Zheng, Deyin; Meng, Xiaodi; Yang, Jiasheng; Peng, Lihong; Fu, Yu; Zhang, Dahan; Peng, Shouneng; Huang, Haiyun; Zhang, Yi

    2017-01-01

    Obesity is a primary risk factor for many diseases such as certain cancers. In this study, we have developed three algorithms including a random-walk based method OBNet, a shortest-path based method OBsp and a direct-overlap method OBoverlap, to reveal obesity-disease connections at protein-interaction subnetworks corresponding to thousands of biological functions and pathways. Through literature mining, we also curated an obesity-associated disease list, by which we compared the methods. As a result, OBNet outperforms other two methods. OBNet can predict whether a disease is obesity-related based on its associated genes. Meanwhile, OBNet identifies extensive connections between obesity genes and genes associated with a few diseases at various functional modules and pathways. Using breast cancer and Type 2 diabetes as two examples, OBNet identifies meaningful genes that may play key roles in connecting obesity and the two diseases. For example, TGFB1 and VEGFA are inferred to be the top two key genes mediating obesity-breast cancer connection in modules associated with brain development. Finally, the top modules identified by OBNet in breast cancer significantly overlap with modules identified from TCGA breast cancer gene expression study, revealing the power of OBNet in identifying biological processes involved in the disease. PMID:29156709

  12. Randomization Based Privacy Preserving Categorical Data Analysis

    ERIC Educational Resources Information Center

    Guo, Ling

    2010-01-01

    The success of data mining relies on the availability of high quality data. To ensure quality data mining, effective information sharing between organizations becomes a vital requirement in today's society. Since data mining often involves sensitive information of individuals, the public has expressed a deep concern about their privacy.…

  13. A review on data mining and continuous optimization applications in computational biology and medicine.

    PubMed

    Weber, Gerhard-Wilhelm; Ozöğür-Akyüz, Süreyya; Kropat, Erik

    2009-06-01

    An emerging research area in computational biology and biotechnology is devoted to mathematical modeling and prediction of gene-expression patterns; it nowadays requests mathematics to deeply understand its foundations. This article surveys data mining and machine learning methods for an analysis of complex systems in computational biology. It mathematically deepens recent advances in modeling and prediction by rigorously introducing the environment and aspects of errors and uncertainty into the genetic context within the framework of matrix and interval arithmetics. Given the data from DNA microarray experiments and environmental measurements, we extract nonlinear ordinary differential equations which contain parameters that are to be determined. This is done by a generalized Chebychev approximation and generalized semi-infinite optimization. Then, time-discretized dynamical systems are studied. By a combinatorial algorithm which constructs and follows polyhedra sequences, the region of parametric stability is detected. In addition, we analyze the topological landscape of gene-environment networks in terms of structural stability. As a second strategy, we will review recent model selection and kernel learning methods for binary classification which can be used to classify microarray data for cancerous cells or for discrimination of other kind of diseases. This review is practically motivated and theoretically elaborated; it is devoted to a contribution to better health care, progress in medicine, a better education, and more healthy living conditions.

  14. Off-road truck-related accidents in U.S. mines

    PubMed Central

    Dindarloo, Saeid R.; Pollard, Jonisha P.; Siami-Irdemoosa, Elnaz

    2016-01-01

    Introduction Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends. Methods A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level. Results Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5 years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives. Conclusions The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries. Practical application Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. PMID:27620937

  15. Off-road truck-related accidents in U.S. mines.

    PubMed

    Dindarloo, Saeid R; Pollard, Jonisha P; Siami-Irdemoosa, Elnaz

    2016-09-01

    Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends. A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level. Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives. The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries. Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. Copyright © 2016 Elsevier Ltd and National Safety Council. All rights reserved.

  16. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  17. Characterization of the myometrial transcriptome in women with an arrest of dilatation during labor

    PubMed Central

    Chaemsaithong, Piya; Madan, Ichchha; Romero, Roberto; Than, Nandor G; Tarca, Adi L; Draghici, Sorin; Bhatti, Gaurav; Mazor, Moshe; Kim, Chong Jai; Hassan, Sonia S; Chaiworapongsa, Tinnakorn

    2014-01-01

    Objective The molecular basis of failure to progress in labor is poorly understood. This study was undertaken to characterize the myometrial transcriptome of patients with an arrest of dilatation (AODIL). Study design Human myometrium was prospectively collected from women in the following groups: 1) spontaneous term labor (TL; n=29); and 2) arrest of dilatation (AODIL; n=14). Gene expression was characterized using Illumina® HumanHT-12 microarrays. A moderated student t-test and false discovery rate adjustment were used for analysis. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) of selected genes was performed in an independent sample set. Pathway analysis was performed on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database using Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). The Metacore knowledge base was also mined for pathway analysis. Results 1) 42 genes differentially expressed were identified in women with an AODIL; 2) gene ontology analysis indicated enrichment of biological processes, which included: regulation of angiogenesis, response to hypoxia, inflammatory response, and chemokine-mediated signaling pathway. Enriched molecular functions included: transcription repressor activity, Heat shock protein (Hsp) 90 binding, and nitric oxide synthase (NOS) activity; 3) Metacore analysis identified immune response chemokine (C-C motif) ligand 2 (CCL2) signaling, muscle contraction regulation of eNOS activity in endothelial cells, and Triiodothyronine and Thyroxine signaling as significantly over-represented (FDR<0.05); 4) qRT-PCR confirmed overexpression of Nitric oxide synthase 3 NOS3; hypoxic ischemic factor (HIF1A), Chemokine (C-C motif) ligand 2 (CCL2); angiopoietin-like 4 (ANGPTL4), ADAM metallopeptidase with thrombospondin type 1, motif 9 (ADAMTS9), G protein-coupled receptor 4 (GPR4), metallothionein 1A (MT1A), MT2A, selectin E (SELE) in an AODIL. Conclusion The myometrium of women with arrest of dilatation have a stereotypic transcriptome profile. This disorder was associated with a pattern of gene expression involved in muscle contraction, an inflammatory response, and hypoxia. This is the first comprehensive and unbiased examination of the molecular basis of an AODIL. PMID:23893668

  18. Text-mining and information-retrieval services for molecular biology

    PubMed Central

    Krallinger, Martin; Valencia, Alfonso

    2005-01-01

    Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455

  19. Phylogeny and Expression Analyses Reveal Important Roles for Plant PKS III Family during the Conquest of Land by Plants and Angiosperm Diversification

    PubMed Central

    Xie, Lulu; Liu, Pingli; Zhu, Zhixin; Zhang, Shifan; Zhang, Shujiang; Li, Fei; Zhang, Hui; Li, Guoliang; Wei, Yunxiao; Sun, Rifei

    2016-01-01

    Polyketide synthases (PKSs) utilize the products of primary metabolism to synthesize a wide array of secondary metabolites in both prokaryotic and eukaryotic organisms. PKSs can be grouped into three distinct classes, types I, II, and III, based on enzyme structure, substrate specificity, and catalytic mechanisms. The type III PKS enzymes function as homodimers, and are the only class of PKS that do not require acyl carrier protein. Plant type III PKS enzymes, also known as chalcone synthase (CHS)-like enzymes, are of particular interest due to their functional diversity. In this study, we mined type III PKS gene sequences from the genomes of six aquatic algae and 25 land plants (1 bryophyte, 1 lycophyte, 2 basal angiosperms, 16 core eudicots, and 5 monocots). PKS III sequences were found relatively conserved in all embryophytes, but not exist in algae. We also examined gene expression patterns by analyzing available transcriptome data, and identified potential cis-regulatory elements in upstream sequences. Phylogenetic trees of dicots angiosperms showed that plant type III PKS proteins fall into three clades. Clade A contains CHS/STS-type enzymes coding genes with diverse transcriptional expression patterns and enzymatic functions, while clade B is further divided into subclades b1 and b2, which consist of anther-specific CHS-like enzymes. Differentiation regions, such as amino acids 196-207 between clades A and B, and predicted positive selected sites within α-helixes in late appeared branches of clade A, account for the major diversification in substrate choice and catalytic reaction. The integrity and location of conserved cis-elements containing MYB and bHLH binding sites can affect transcription levels. Potential binding sites for transcription factors such as WRKY, SPL, or AP2/EREBP may contribute to tissue- or taxon-specific differences in gene expression. Our data shows that gene duplications and functional diversification of plant type III PKS enzymes played a critical role in the ancient conquest of the land by early plants and angiosperm diversification. PMID:27625671

  20. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

    PubMed Central

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

Top