Sample records for query gene expression

  1. Query-based biclustering of gene expression data using Probabilistic Relational Models.

    PubMed

    Zhao, Hui; Cloots, Lore; Van den Bulcke, Tim; Wu, Yan; De Smet, Riet; Storms, Valerie; Meysman, Pieter; Engelen, Kristof; Marchal, Kathleen

    2011-02-15

    With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set. We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds. ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

  2. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

    PubMed Central

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-01-01

    Background DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865

  3. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.

    PubMed

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-09-03

    DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.

  4. Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

    PubMed

    Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V

    2016-01-01

    Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.

  5. Renal Gene Expression Database (RGED): a relational database of gene expression profiles in kidney disease

    PubMed Central

    Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo

    2014-01-01

    We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Availability and implementation: Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. Database URL: http://rged.wall-eva.net PMID:25252782

  6. Renal Gene Expression Database (RGED): a relational database of gene expression profiles in kidney disease.

    PubMed

    Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo

    2014-01-01

    We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. http://rged.wall-eva.net. © The Author(s) 2014. Published by Oxford University Press.

  7. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additionalmore » genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.« less

  8. CellLineNavigator: a workbench for cancer cell line analysis

    PubMed Central

    Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

    2013-01-01

    The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487

  9. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures.

    PubMed

    Duan, Qiaonan; Flynn, Corey; Niepel, Mario; Hafner, Marc; Muhlich, Jeremy L; Fernandez, Nicolas F; Rouillard, Andrew D; Tan, Christopher M; Chen, Edward Y; Golub, Todd R; Sorger, Peter K; Subramanian, Aravind; Ma'ayan, Avi

    2014-07-01

    For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures.

    PubMed

    Vazquez, Miguel; Nogales-Cadenas, Ruben; Arroyo, Javier; Botías, Pedro; García, Raul; Carazo, Jose M; Tirado, Francisco; Pascual-Montano, Alberto; Carmona-Saez, Pedro

    2010-07-01

    The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.

  11. A biomarker-based screen of a gene expression compendium reveals regulation of Nrf2 by CAR and STAT5b

    EPA Science Inventory

    Computational approaches were developed to identify factors that regulate Nrf2 in a large gene expression compendium of microarray profiles including >2000 comparisons which queried the effects of chemicals, genes, diets, and infectious agents on gene expression in the mouse l...

  12. Expression Atlas: gene and protein expression across multiple studies and organisms

    PubMed Central

    Tang, Y Amy; Bazant, Wojciech; Burke, Melissa; Fuentes, Alfonso Muñoz-Pomer; George, Nancy; Koskinen, Satu; Mohammed, Suhaib; Geniza, Matthew; Preece, Justin; Jarnuczak, Andrew F; Huber, Wolfgang; Stegle, Oliver; Brazma, Alvis; Petryszak, Robert

    2018-01-01

    Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions. PMID:29165655

  13. FlyAtlas: database of gene expression in the tissues of Drosophila melanogaster

    PubMed Central

    Robinson, Scott W.; Herzyk, Pawel; Dow, Julian A. T.; Leader, David P.

    2013-01-01

    The FlyAtlas resource contains data on the expression of the genes of Drosophila melanogaster in different tissues (currently 25—17 adult and 8 larval) obtained by hybridization of messenger RNA to Affymetrix Drosophila Genome 2 microarrays. The microarray probe sets cover 13 250 Drosophila genes, detecting 12 533 in an unambiguous manner. The data underlying the original web application (http://flyatlas.org) have been restructured into a relational database and a Java servlet written to provide a new web interface, FlyAtlas 2 (http://flyatlas.gla.ac.uk/), which allows several additional queries. Users can retrieve data for individual genes or for groups of genes belonging to the same or related ontological categories. Assistance in selecting valid search terms is provided by an Ajax ‘autosuggest’ facility that polls the database as the user types. Searches can also focus on particular tissues, and data can be retrieved for the most highly expressed genes, for genes of a particular category with above-average expression or for genes with the greatest difference in expression between the larval and adult stages. A novel facility allows the database to be queried with a specific gene to find other genes with a similar pattern of expression across the different tissues. PMID:23203866

  14. FlyAtlas: database of gene expression in the tissues of Drosophila melanogaster.

    PubMed

    Robinson, Scott W; Herzyk, Pawel; Dow, Julian A T; Leader, David P

    2013-01-01

    The FlyAtlas resource contains data on the expression of the genes of Drosophila melanogaster in different tissues (currently 25-17 adult and 8 larval) obtained by hybridization of messenger RNA to Affymetrix Drosophila Genome 2 microarrays. The microarray probe sets cover 13,250 Drosophila genes, detecting 12,533 in an unambiguous manner. The data underlying the original web application (http://flyatlas.org) have been restructured into a relational database and a Java servlet written to provide a new web interface, FlyAtlas 2 (http://flyatlas.gla.ac.uk/), which allows several additional queries. Users can retrieve data for individual genes or for groups of genes belonging to the same or related ontological categories. Assistance in selecting valid search terms is provided by an Ajax 'autosuggest' facility that polls the database as the user types. Searches can also focus on particular tissues, and data can be retrieved for the most highly expressed genes, for genes of a particular category with above-average expression or for genes with the greatest difference in expression between the larval and adult stages. A novel facility allows the database to be queried with a specific gene to find other genes with a similar pattern of expression across the different tissues.

  15. Application of connectivity mapping in predictive toxicology based on gene-expression similarity.

    PubMed

    Smalley, Joshua L; Gant, Timothy W; Zhang, Shu-Dong

    2010-02-09

    Connectivity mapping is the process of establishing connections between different biological states using gene-expression profiles or signatures. There are a number of applications but in toxicology the most pertinent is for understanding mechanisms of toxicity. In its essence the process involves comparing a query gene signature generated as a result of exposure of a biological system to a chemical to those in a database that have been previously derived. In the ideal situation the query gene-expression signature is characteristic of the event and will be matched to similar events in the database. Key criteria are therefore the means of choosing the signature to be matched and the means by which the match is made. In this article we explore these concepts with examples applicable to toxicology. (c) 2009 Elsevier Ireland Ltd. All rights reserved.

  16. Gene Expression Omnibus (GEO): Microarray data storage, submission, retrieval, and analysis

    PubMed Central

    Barrett, Tanya

    2006-01-01

    The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely distributes high-throughput molecular abundance data, predominantly gene expression data generated by DNA microarray technology. The database has a flexible design that can handle diverse styles of both unprocessed and processed data in a MIAME- (Minimum Information About a Microarray Experiment) supportive infrastructure that promotes fully annotated submissions. GEO currently stores about a billion individual gene expression measurements, derived from over 100 organisms, submitted by over 1,500 laboratories, addressing a wide range of biological phenomena. To maximize the utility of these data, several user-friendly Web-based interfaces and applications have been implemented that enable effective exploration, query, and visualization of these data, at the level of individual genes or entire studies. This chapter describes how the data are stored, submission procedures, and mechanisms for data retrieval and query. GEO is publicly accessible at http://www.ncbi.nlm.nih.gov/projects/geo/. PMID:16939800

  17. Arabidopsis Gene Family Profiler (aGFP)--user-oriented transcriptomic database with easy-to-use graphic interface.

    PubMed

    Dupl'áková, Nikoleta; Renák, David; Hovanec, Patrik; Honysová, Barbora; Twell, David; Honys, David

    2007-07-23

    Microarray technologies now belong to the standard functional genomics toolbox and have undergone massive development leading to increased genome coverage, accuracy and reliability. The number of experiments exploiting microarray technology has markedly increased in recent years. In parallel with the rapid accumulation of transcriptomic data, on-line analysis tools are being introduced to simplify their use. Global statistical data analysis methods contribute to the development of overall concepts about gene expression patterns and to query and compose working hypotheses. More recently, these applications are being supplemented with more specialized products offering visualization and specific data mining tools. We present a curated gene family-oriented gene expression database, Arabidopsis Gene Family Profiler (aGFP; http://agfp.ueb.cas.cz), which gives the user access to a large collection of normalised Affymetrix ATH1 microarray datasets. The database currently contains NASC Array and AtGenExpress transcriptomic datasets for various tissues at different developmental stages of wild type plants gathered from nearly 350 gene chips. The Arabidopsis GFP database has been designed as an easy-to-use tool for users needing an easily accessible resource for expression data of single genes, pre-defined gene families or custom gene sets, with the further possibility of keyword search. Arabidopsis Gene Family Profiler presents a user-friendly web interface using both graphic and text output. Data are stored at the MySQL server and individual queries are created in PHP script. The most distinguishable features of Arabidopsis Gene Family Profiler database are: 1) the presentation of normalized datasets (Affymetrix MAS algorithm and calculation of model-based gene-expression values based on the Perfect Match-only model); 2) the choice between two different normalization algorithms (Affymetrix MAS4 or MAS5 algorithms); 3) an intuitive interface; 4) an interactive "virtual plant" visualizing the spatial and developmental expression profiles of both gene families and individual genes. Arabidopsis GFP gives users the possibility to analyze current Arabidopsis developmental transcriptomic data starting with simple global queries that can be expanded and further refined to visualize comparative and highly selective gene expression profiles.

  18. ExpTreeDB: web-based query and visualization of manually annotated gene expression profiling experiments of human and mouse from GEO.

    PubMed

    Ni, Ming; Ye, Fuqiang; Zhu, Juanjuan; Li, Zongwei; Yang, Shuai; Yang, Bite; Han, Lu; Wu, Yongge; Chen, Ying; Li, Fei; Wang, Shengqi; Bo, Xiaochen

    2014-12-01

    Numerous public microarray datasets are valuable resources for the scientific communities. Several online tools have made great steps to use these data by querying related datasets with users' own gene signatures or expression profiles. However, dataset annotation and result exhibition still need to be improved. ExpTreeDB is a database that allows for queries on human and mouse microarray experiments from Gene Expression Omnibus with gene signatures or profiles. Compared with similar applications, ExpTreeDB pays more attention to dataset annotations and result visualization. We introduced a multiple-level annotation system to depict and organize original experiments. For example, a tamoxifen-treated cell line experiment is hierarchically annotated as 'agent→drug→estrogen receptor antagonist→tamoxifen'. Consequently, retrieved results are exhibited by an interactive tree-structured graphics, which provide an overview for related experiments and might enlighten users on key items of interest. The database is freely available at http://biotech.bmi.ac.cn/ExpTreeDB. Web site is implemented in Perl, PHP, R, MySQL and Apache. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. DGEM--a microarray gene expression database for primary human disease tissues.

    PubMed

    Xia, Yuni; Campen, Andrew; Rigsby, Dan; Guo, Ying; Feng, Xingdong; Su, Eric W; Palakal, Mathew; Li, Shuyu

    2007-01-01

    Gene expression patterns can reflect gene regulations in human tissues under normal or pathologic conditions. Gene expression profiling data from studies of primary human disease samples are particularly valuable since these studies often span many years in order to collect patient clinical information and achieve a large sample size. Disease-to-Gene Expression Mapper (DGEM) provides a beneficial community resource to access and analyze these data; it currently includes Affymetrix oligonucleotide array datasets for more than 40 human diseases and 1400 samples. The data are normalized to the same scale and stored in a relational database. A statistical-analysis pipeline was implemented to identify genes abnormally expressed in disease tissues or genes whose expressions are associated with clinical parameters such as cancer patient survival. Data-mining results can be queried through a web-based interface at http://dgem.dhcp.iupui.edu/. The query tool enables dynamic generation of graphs and tables that are further linked to major gene and pathway resources that connect the data to relevant biology, including Entrez Gene and Kyoto Encyclopedia of Genes and Genomes (KEGG). In summary, DGEM provides scientists and physicians a valuable tool to study disease mechanisms, to discover potential disease biomarkers for diagnosis and prognosis, and to identify novel gene targets for drug discovery. The source code is freely available for non-profit use, on request to the authors.

  20. Access and use of the GUDMAP database of genitourinary development.

    PubMed

    Davies, Jamie A; Little, Melissa H; Aronow, Bruce; Armstrong, Jane; Brennan, Jane; Lloyd-MacGilp, Sue; Armit, Chris; Harding, Simon; Piu, Xinjun; Roochun, Yogmatee; Haggarty, Bernard; Houghton, Derek; Davidson, Duncan; Baldock, Richard

    2012-01-01

    The Genitourinary Development Molecular Atlas Project (GUDMAP) aims to document gene expression across time and space in the developing urogenital system of the mouse, and to provide access to a variety of relevant practical and educational resources. Data come from microarray gene expression profiling (from laser-dissected and FACS-sorted samples) and in situ hybridization at both low (whole-mount) and high (section) resolutions. Data are annotated to a published, high-resolution anatomical ontology and can be accessed using a variety of search interfaces. Here, we explain how to run typical queries on the database, by gene or anatomical location, how to view data, how to perform complex queries, and how to submit data.

  1. Multidimensional quantitative analysis of mRNA expression within intact vertebrate embryos.

    PubMed

    Trivedi, Vikas; Choi, Harry M T; Fraser, Scott E; Pierce, Niles A

    2018-01-08

    For decades, in situ hybridization methods have been essential tools for studies of vertebrate development and disease, as they enable qualitative analyses of mRNA expression in an anatomical context. Quantitative mRNA analyses typically sacrifice the anatomy, relying on embryo microdissection, dissociation, cell sorting and/or homogenization. Here, we eliminate the trade-off between quantitation and anatomical context, using quantitative in situ hybridization chain reaction (qHCR) to perform accurate and precise relative quantitation of mRNA expression with subcellular resolution within whole-mount vertebrate embryos. Gene expression can be queried in two directions: read-out from anatomical space to expression space reveals co-expression relationships in selected regions of the specimen; conversely, read-in from multidimensional expression space to anatomical space reveals those anatomical locations in which selected gene co-expression relationships occur. As we demonstrate by examining gene circuits underlying somitogenesis, quantitative read-out and read-in analyses provide the strengths of flow cytometry expression analyses, but by preserving subcellular anatomical context, they enable bi-directional queries that open a new era for in situ hybridization. © 2018. Published by The Company of Biologists Ltd.

  2. Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations

    PubMed Central

    Gurunathan, Rajalakshmi; Van Emden, Bernard; Panchanathan, Sethuraman; Kumar, Sudhir

    2004-01-01

    Background Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns. Results The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches. Conclusions We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns. PMID:15603586

  3. TOM: a web-based integrated approach for identification of candidate disease genes.

    PubMed

    Rossi, Simona; Masotti, Daniele; Nardini, Christine; Bonora, Elena; Romeo, Giovanni; Macii, Enrico; Benini, Luca; Volinia, Stefano

    2006-07-01

    The massive production of biological data by means of highly parallel devices like microarrays for gene expression has paved the way to new possible approaches in molecular genetics. Among them the possibility of inferring biological answers by querying large amounts of expression data. Based on this principle, we present here TOM, a web-based resource for the efficient extraction of candidate genes for hereditary diseases. The service requires the previous knowledge of at least another gene responsible for the disease and the linkage area, or else of two disease associated genetic intervals. The algorithm uses the information stored in public resources, including mapping, expression and functional databases. Given the queries, TOM will select and list one or more candidate genes. This approach allows the geneticist to bypass the costly and time consuming tracing of genetic markers through entire families and might improve the chance of identifying disease genes, particularly for rare diseases. We present here the tool and the results obtained on known benchmark and on hereditary predisposition to familial thyroid cancer. Our algorithm is available at http://www-micrel.deis.unibo.it/~tom/.

  4. GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus

    PubMed Central

    Zhu, Yuelin; Davis, Sean; Stephens, Robert; Meltzer, Paul S.; Chen, Yidong

    2008-01-01

    The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. Availability: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website. Contact: yidong@mail.nih.gov PMID:18842599

  5. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms.

    PubMed

    Jani, Saurin D; Argraves, Gary L; Barth, Jeremy L; Argraves, W Scott

    2010-04-01

    An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies, intermolecular interactions, overlays of genes onto KEGG pathway diagrams and heatmaps of expression intensity values. GeneMesh is freely available online at http://proteogenomics.musc.edu/genemesh/.

  6. EMAP and EMAGE: a framework for understanding spatially organized data.

    PubMed

    Baldock, Richard A; Bard, Jonathan B L; Burger, Albert; Burton, Nicolas; Christiansen, Jeff; Feng, Guanjie; Hill, Bill; Houghton, Derek; Kaufman, Matthew; Rao, Jianguo; Sharpe, James; Ross, Allyson; Stevenson, Peter; Venkataraman, Shanmugasundaram; Waterhouse, Andrew; Yang, Yiya; Davidson, Duncan R

    2003-01-01

    The Edinburgh MouseAtlas Project (EMAP) is a time-series of mouse-embryo volumetric models. The models provide a context-free spatial framework onto which structural interpretations and experimental data can be mapped. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. The atlas also includes a time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues. The models provide a natural, graphical context for browsing and visualizing complex data. The Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) is one of the first applications of the EMAP framework and provides a spatially mapped gene-expression database with associated tools for data mapping, submission, and query. In this article, we describe the underlying principles of the Atlas and the gene-expression database, and provide a practical introduction to the use of the EMAP and EMAGE tools, including use of new techniques for whole body gene-expression data capture and mapping.

  7. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface.

    PubMed

    Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B; Almon, Richard R; DuBois, Debra C; Jusko, William J; Hoffman, Eric P

    2004-01-01

    Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).

  8. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface

    PubMed Central

    Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B.; Almon, Richard R.; DuBois, Debra C.; Jusko, William J.; Hoffman, Eric P.

    2004-01-01

    Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp). PMID:14681485

  9. An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence

    PubMed Central

    Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.

    2008-01-01

    Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495

  10. An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence.

    PubMed

    Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P

    2008-10-01

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/

  11. SEGEL: A Web Server for Visualization of Smoking Effects on Human Lung Gene Expression.

    PubMed

    Xu, Yan; Hu, Brian; Alnajm, Sammy S; Lu, Yin; Huang, Yangxin; Allen-Gipson, Diane; Cheng, Feng

    2015-01-01

    Cigarette smoking is a major cause of death worldwide resulting in over six million deaths per year. Cigarette smoke contains complex mixtures of chemicals that are harmful to nearly all organs of the human body, especially the lungs. Cigarette smoking is considered the major risk factor for many lung diseases, particularly chronic obstructive pulmonary diseases (COPD) and lung cancer. However, the underlying molecular mechanisms of smoking-induced lung injury associated with these lung diseases still remain largely unknown. Expression microarray techniques have been widely applied to detect the effects of smoking on gene expression in different human cells in the lungs. These projects have provided a lot of useful information for researchers to understand the potential molecular mechanism(s) of smoke-induced pathogenesis. However, a user-friendly web server that would allow scientists to fast query these data sets and compare the smoking effects on gene expression across different cells had not yet been established. For that reason, we have integrated eight public expression microarray data sets from trachea epithelial cells, large airway epithelial cells, small airway epithelial cells, and alveolar macrophage into an online web server called SEGEL (Smoking Effects on Gene Expression of Lung). Users can query gene expression patterns across these cells from smokers and nonsmokers by gene symbols, and find the effects of smoking on the gene expression of lungs from this web server. Sex difference in response to smoking is also shown. The relationship between the gene expression and cigarette smoking consumption were calculated and are shown in the server. The current version of SEGEL web server contains 42,400 annotated gene probe sets represented on the Affymetrix Human Genome U133 Plus 2.0 platform. SEGEL will be an invaluable resource for researchers interested in the effects of smoking on gene expression in the lungs. The server also provides useful information for drug development against smoking-related diseases. The SEGEL web server is available online at http://www.chengfeng.info/smoking_database.html.

  12. chromoWIZ: a web tool to query and visualize chromosome-anchored genes from cereal and model genomes.

    PubMed

    Nussbaumer, Thomas; Kugler, Karl G; Schweiger, Wolfgang; Bader, Kai C; Gundlach, Heidrun; Spannagl, Manuel; Poursarebani, Naser; Pfeifer, Matthias; Mayer, Klaus F X

    2014-12-10

    Over the last years reference genome sequences of several economically and scientifically important cereals and model plants became available. Despite the agricultural significance of these crops only a small number of tools exist that allow users to inspect and visualize the genomic position of genes of interest in an interactive manner. We present chromoWIZ, a web tool that allows visualizing the genomic positions of relevant genes and comparing these data between different plant genomes. Genes can be queried using gene identifiers, functional annotations, or sequence homology in four grass species (Triticum aestivum, Hordeum vulgare, Brachypodium distachyon, Oryza sativa). The distribution of the anchored genes is visualized along the chromosomes by using heat maps. Custom gene expression measurements, differential expression information, and gene-to-group mappings can be uploaded and can be used for further filtering. This tool is mainly designed for breeders and plant researchers, who are interested in the location and the distribution of candidate genes as well as in the syntenic relationships between different grass species. chromoWIZ is freely available and online accessible at http://mips.helmholtz-muenchen.de/plant/chromoWIZ/index.jsp.

  13. TDR Targets: a chemogenomics resource for neglected diseases.

    PubMed

    Magariños, María P; Carmona, Santiago J; Crowther, Gregory J; Ralph, Stuart A; Roos, David S; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C; Agüero, Fernán

    2012-01-01

    The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context.

  14. TDR Targets: a chemogenomics resource for neglected diseases

    PubMed Central

    Magariños, María P.; Carmona, Santiago J.; Crowther, Gregory J.; Ralph, Stuart A.; Roos, David S.; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C.; Agüero, Fernán

    2012-01-01

    The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context. PMID:22116064

  15. EPConDB: a web resource for gene expression related to pancreatic development, beta-cell function and diabetes.

    PubMed

    Mazzarelli, Joan M; Brestelli, John; Gorski, Regina K; Liu, Junmin; Manduchi, Elisabetta; Pinney, Deborah F; Schug, Jonathan; White, Peter; Kaestner, Klaus H; Stoeckert, Christian J

    2007-01-01

    EPConDB (http://www.cbil.upenn.edu/EPConDB) is a public web site that supports research in diabetes, pancreatic development and beta-cell function by providing information about genes expressed in cells of the pancreas. EPConDB displays expression profiles for individual genes and information about transcripts, promoter elements and transcription factor binding sites. Gene expression results are obtained from studies examining tissue expression, pancreatic development and growth, differentiation of insulin-producing cells, islet or beta-cell injury, and genetic models of impaired beta-cell function. The expression datasets are derived using different microarray platforms, including the BCBC PancChips and Affymetrix gene expression arrays. Other datasets include semi-quantitative RT-PCR and MPSS expression studies. For selected microarray studies, lists of differentially expressed genes, derived from PaGE analysis, are displayed on the site. EPConDB provides database queries and tools to examine the relationship between a gene, its transcriptional regulation, protein function and expression in pancreatic tissues.

  16. Genotet: An Interactive Web-based Visual Exploration Framework to Support Validation of Gene Regulatory Networks.

    PubMed

    Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T

    2014-12-01

    Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).

  17. EXP-PAC: providing comparative analysis and storage of next generation gene expression data.

    PubMed

    Church, Philip C; Goscinski, Andrzej; Lefèvre, Christophe

    2012-07-01

    Microarrays and more recently RNA sequencing has led to an increase in available gene expression data. How to manage and store this data is becoming a key issue. In response we have developed EXP-PAC, a web based software package for storage, management and analysis of gene expression and sequence data. Unique to this package is SQL based querying of gene expression data sets, distributed normalization of raw gene expression data and analysis of gene expression data across experiments and species. This package has been populated with lactation data in the international milk genomic consortium web portal (http://milkgenomics.org/). Source code is also available which can be hosted on a Windows, Linux or Mac APACHE server connected to a private or public network (http://mamsap.it.deakin.edu.au/~pcc/Release/EXP_PAC.html). Copyright © 2012 Elsevier Inc. All rights reserved.

  18. GEMINI: a computationally-efficient search engine for large gene expression datasets.

    PubMed

    DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

    2016-02-24

    Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.

  19. Babesia bovis expresses Bbo-6cys-E, a member of a novel gene family that is homologous to the 6-cys family of Plasmodium

    USDA-ARS?s Scientific Manuscript database

    A novel Babesia bovis gene family encoding proteins with similarities to the Plasmodium 6cys protein family was identified by TBLASTN searches of the Babesia bovis genome using the sequence of the P. falciparum PFS230 protein as query, and was termed Bbo-6cys gene family. The Bbo-cys6 gene family co...

  20. STARNET 2: a web-based tool for accelerating discovery of gene regulatory networks using microarray co-expression data

    PubMed Central

    Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent

    2009-01-01

    Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new HEATSEEKER module. Conclusion STARNET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a STARNET network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at , and does not require user registration. PMID:19828039

  1. Network-based analysis of differentially expressed genes in cerebrospinal fluid (CSF) and blood reveals new candidate genes for multiple sclerosis

    PubMed Central

    Safari-Alighiarloo, Nahid; Taghizadeh, Mohammad; Tabatabaei, Seyyed Mohammad; Namaki, Saeed

    2016-01-01

    Background The involvement of multiple genes and missing heritability, which are dominant in complex diseases such as multiple sclerosis (MS), entail using network biology to better elucidate their molecular basis and genetic factors. We therefore aimed to integrate interactome (protein–protein interaction (PPI)) and transcriptomes data to construct and analyze PPI networks for MS disease. Methods Gene expression profiles in paired cerebrospinal fluid (CSF) and peripheral blood mononuclear cells (PBMCs) samples from MS patients, sampled in relapse or remission and controls, were analyzed. Differentially expressed genes which determined only in CSF (MS vs. control) and PBMCs (relapse vs. remission) separately integrated with PPI data to construct the Query-Query PPI (QQPPI) networks. The networks were further analyzed to investigate more central genes, functional modules and complexes involved in MS progression. Results The networks were analyzed and high centrality genes were identified. Exploration of functional modules and complexes showed that the majority of high centrality genes incorporated in biological pathways driving MS pathogenesis. Proteasome and spliceosome were also noticeable in enriched pathways in PBMCs (relapse vs. remission) which were identified by both modularity and clique analyses. Finally, STK4, RB1, CDKN1A, CDK1, RAC1, EZH2, SDCBP genes in CSF (MS vs. control) and CDC37, MAP3K3, MYC genes in PBMCs (relapse vs. remission) were identified as potential candidate genes for MS, which were the more central genes involved in biological pathways. Discussion This study showed that network-based analysis could explicate the complex interplay between biological processes underlying MS. Furthermore, an experimental validation of candidate genes can lead to identification of potential therapeutic targets. PMID:28028462

  2. A genetic network that suppresses genome rearrangements in Saccharomyces cerevisiae and contains defects in cancers

    PubMed Central

    Putnam, Christopher D.; Srivatsan, Anjana; Nene, Rahul V.; Martinez, Sandra L.; Clotfelter, Sarah P.; Bell, Sara N.; Somach, Steven B.; E.S. de Souza, Jorge; Fonseca, André F.; de Souza, Sandro J.; Kolodner, Richard D.

    2016-01-01

    Gross chromosomal rearrangements (GCRs) play an important role in human diseases, including cancer. The identity of all Genome Instability Suppressing (GIS) genes is not currently known. Here multiple Saccharomyces cerevisiae GCR assays and query mutations were crossed into arrays of mutants to identify progeny with increased GCR rates. One hundred eighty two GIS genes were identified that suppressed GCR formation. Another 438 cooperatively acting GIS genes were identified that were not GIS genes, but suppressed the increased genome instability caused by individual query mutations. Analysis of TCGA data using the human genes predicted to act in GIS pathways revealed that a minimum of 93% of ovarian and 66% of colorectal cancer cases had defects affecting one or more predicted GIS gene. These defects included loss-of-function mutations, copy-number changes associated with reduced expression, and silencing. In contrast, acute myeloid leukaemia cases did not appear to have defects affecting the predicted GIS genes. PMID:27071721

  3. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

    PubMed Central

    Wang, Zichen; Monteiro, Caroline D.; Jagodnik, Kathleen M.; Fernandez, Nicolas F.; Gundersen, Gregory W.; Rouillard, Andrew D.; Jenkins, Sherry L.; Feldmann, Axel S.; Hu, Kevin S.; McDermott, Michael G.; Duan, Qiaonan; Clark, Neil R.; Jones, Matthew R.; Kou, Yan; Goff, Troy; Woodland, Holly; Amaral, Fabio M R.; Szeto, Gregory L.; Fuchs, Oliver; Schüssler-Fiorenza Rose, Sophia M.; Sharma, Shvetank; Schwartz, Uwe; Bausela, Xabier Bengoetxea; Szymkiewicz, Maciej; Maroulis, Vasileios; Salykin, Anton; Barra, Carolina M.; Kruth, Candice D.; Bongio, Nicholas J.; Mathur, Vaibhav; Todoric, Radmila D; Rubin, Udi E.; Malatras, Apostolos; Fulp, Carl T.; Galindo, John A.; Motiejunaite, Ruta; Jüschke, Christoph; Dishuck, Philip C.; Lahl, Katharina; Jafari, Mohieddin; Aibar, Sara; Zaravinos, Apostolos; Steenhuizen, Linda H.; Allison, Lindsey R.; Gamallo, Pablo; de Andres Segura, Fernando; Dae Devlin, Tyler; Pérez-García, Vicente; Ma'ayan, Avi

    2016-01-01

    Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization. PMID:27667448

  4. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd.

    PubMed

    Wang, Zichen; Monteiro, Caroline D; Jagodnik, Kathleen M; Fernandez, Nicolas F; Gundersen, Gregory W; Rouillard, Andrew D; Jenkins, Sherry L; Feldmann, Axel S; Hu, Kevin S; McDermott, Michael G; Duan, Qiaonan; Clark, Neil R; Jones, Matthew R; Kou, Yan; Goff, Troy; Woodland, Holly; Amaral, Fabio M R; Szeto, Gregory L; Fuchs, Oliver; Schüssler-Fiorenza Rose, Sophia M; Sharma, Shvetank; Schwartz, Uwe; Bausela, Xabier Bengoetxea; Szymkiewicz, Maciej; Maroulis, Vasileios; Salykin, Anton; Barra, Carolina M; Kruth, Candice D; Bongio, Nicholas J; Mathur, Vaibhav; Todoric, Radmila D; Rubin, Udi E; Malatras, Apostolos; Fulp, Carl T; Galindo, John A; Motiejunaite, Ruta; Jüschke, Christoph; Dishuck, Philip C; Lahl, Katharina; Jafari, Mohieddin; Aibar, Sara; Zaravinos, Apostolos; Steenhuizen, Linda H; Allison, Lindsey R; Gamallo, Pablo; de Andres Segura, Fernando; Dae Devlin, Tyler; Pérez-García, Vicente; Ma'ayan, Avi

    2016-09-26

    Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.

  5. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction.

    PubMed

    Yu, Yao; Tu, Kang; Zheng, Siyuan; Li, Yun; Ding, Guohui; Ping, Jie; Hao, Pei; Li, Yixue

    2009-08-25

    In the post-genomic era, the development of high-throughput gene expression detection technology provides huge amounts of experimental data, which challenges the traditional pipelines for data processing and analyzing in scientific researches. In our work, we integrated gene expression information from Gene Expression Omnibus (GEO), biomedical ontology from Medical Subject Headings (MeSH) and signaling pathway knowledge from sigPathway entries to develop a context mining tool for gene expression analysis - GEOGLE. GEOGLE offers a rapid and convenient way for searching relevant experimental datasets, pathways and biological terms according to multiple types of queries: including biomedical vocabularies, GDS IDs, gene IDs, pathway names and signature list. Moreover, GEOGLE summarizes the signature genes from a subset of GDSes and estimates the correlation between gene expression and the phenotypic distinction with an integrated p value. This approach performing global searching of expression data may expand the traditional way of collecting heterogeneous gene expression experiment data. GEOGLE is a novel tool that provides researchers a quantitative way to understand the correlation between gene expression and phenotypic distinction through meta-analysis of gene expression datasets from different experiments, as well as the biological meaning behind. The web site and user guide of GEOGLE are available at: http://omics.biosino.org:14000/kweb/workflow.jsp?id=00020.

  6. Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.

    PubMed

    Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine

    2010-04-13

    The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.

  7. Optimized Probe Masking for Comparative Transcriptomics of Closely Related Species

    PubMed Central

    Poeschl, Yvonne; Delker, Carolin; Trenner, Jana; Ullrich, Kristian Karsten; Quint, Marcel; Grosse, Ivo

    2013-01-01

    Microarrays are commonly applied to study the transcriptome of specific species. However, many available microarrays are restricted to model organisms, and the design of custom microarrays for other species is often not feasible. Hence, transcriptomics approaches of non-model organisms as well as comparative transcriptomics studies among two or more species often make use of cost-intensive RNAseq studies or, alternatively, by hybridizing transcripts of a query species to a microarray of a closely related species. When analyzing these cross-species microarray expression data, differences in the transcriptome of the query species can cause problems, such as the following: (i) lower hybridization accuracy of probes due to mismatches or deletions, (ii) probes binding multiple transcripts of different genes, and (iii) probes binding transcripts of non-orthologous genes. So far, methods for (i) exist, but these neglect (ii) and (iii). Here, we propose an approach for comparative transcriptomics addressing problems (i) to (iii), which retains only transcript-specific probes binding transcripts of orthologous genes. We apply this approach to an Arabidopsis lyrata expression data set measured on a microarray designed for Arabidopsis thaliana, and compare it to two alternative approaches, a sequence-based approach and a genomic DNA hybridization-based approach. We investigate the number of retained probe sets, and we validate the resulting expression responses by qRT-PCR. We find that the proposed approach combines the benefit of sequence-based stringency and accuracy while allowing the expression analysis of much more genes than the alternative sequence-based approach. As an added benefit, the proposed approach requires probes to detect transcripts of orthologous genes only, which provides a superior base for biological interpretation of the measured expression responses. PMID:24260119

  8. Identification of ATP Citrate Lyase as a Positive Regulator of Glycolytic Function in Glioblastomas

    PubMed Central

    Beckner, Marie E.; Fellows-Mayle, Wendy; Zhang, Zhe; Agostino, Naomi R.; Kant, Jeffrey A.; Day, Billy W.; Pollack, Ian F.

    2009-01-01

    Glioblastomas, the most malignant type of glioma, are more glycolytic than normal brain tissue. Robust migration of glioblastoma cells has been previously demonstrated under glycolytic conditions and their pseudopodia contain increased glycolytic and decreased mitochondrial enzymes. Glycolysis is suppressed by metabolic acids, including citric acid which is excluded from mitochondria during hypoxia. We postulated that glioma cells maintain glycolysis by regulating metabolic acids, especially in their pseudopodia. The enzyme that breaks down cytosolic citric acid is ATP citrate lyase (ACLY). Our identification of increased ACLY in pseudopodia of U87 glioblastoma cells on 1D gels and immunoblots prompted investigation of ACLY gene expression in gliomas for survival data and correlation with expression of ENO1, that encodes enolase 1. Queries of the NIH’s REMBRANDT brain tumor database based on Affymetrix data indicated that decreased survival correlated with increased gene expression of ACLY in gliomas. Queries of gliomas and glioblastomas found an association of upregulated ACLY and ENO1 expression by chi square for all probe sets (reporters) combined and correlation for numbers of probe sets indicating shared upregulation of these genes. Real-time quantitative PCR confirmed correlation between ACLY and ENO1 in 21 glioblastomas (p < 0.001). Inhibition of ACLY with hydroxycitrate suppressed (p < 0.05) in vitro glioblastoma cell migration, clonogenicity and brain invasion under glycolytic conditions and enhanced the suppressive effects of a Met inhibitor on cell migration. In summary, gene expression data, proteomics and functional assays support ACLY as a positive regulator of glycolysis in glioblastomas. PMID:19795461

  9. RNA Expression Profiling Reveals Differentially Regulated Growth Factor and Receptor Expression in Redirected Cancer Cells.

    PubMed

    Schmucker, Hannah S; Park, Jang Pyo; Coissieux, Marie-May; Bentires-Alj, Mohamed; Feltus, F Alex; Booth, Brian W

    2017-05-01

    Tumorigenic cells can be redirected to adopt a normal phenotype when transplanted into cleared mammary fat pads of juvenile female mice in specific ratios with normal epithelial cells. The redirected tumorigenic cells enter stem cell niches and provide progeny that differentiate into all mammary epithelial subtypes. We have developed an in vitro model that mimics the in vivo phenomenon. The shift in phenotype to redirection should be accomplished through a return to a normal gene expression state. To measure this shift, we interrogated the transcriptome of various in vitro model states in search for casual genes. For this study, expression of growth factors, cytokines, and their associated receptors was examined. In all, we queried 251 growth factor and cytokine-related genes. We found numerous growth factor and cytokine genes whose expression levels switched from expression levels seen in cancer cells to expression levels observed in normal cells. The comparisons of gene expression between normal mammary epithelial cells, tumor-derived cells, and redirected cancer cells have revealed insight into active and inactive growth factors and cytokines in cancer cell redirection.

  10. miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling.

    PubMed

    Plaisier, Christopher L; Bare, J Christopher; Baliga, Nitin S

    2011-07-01

    Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3' un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3'-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3'-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net.

  11. A gene-signature progression approach to identifying candidate small-molecule cancer therapeutics with connectivity mapping.

    PubMed

    Wen, Qing; Kim, Chang-Sik; Hamilton, Peter W; Zhang, Shu-Dong

    2016-05-11

    Gene expression connectivity mapping has gained much popularity recently with a number of successful applications in biomedical research testifying its utility and promise. Previously methodological research in connectivity mapping mainly focused on two of the key components in the framework, namely, the reference gene expression profiles and the connectivity mapping algorithms. The other key component in this framework, the query gene signature, has been left to users to construct without much consensus on how this should be done, albeit it has been an issue most relevant to end users. As a key input to the connectivity mapping process, gene signature is crucially important in returning biologically meaningful and relevant results. This paper intends to formulate a standardized procedure for constructing high quality gene signatures from a user's perspective. We describe a two-stage process for making quality gene signatures using gene expression data as initial inputs. First, a differential gene expression analysis comparing two distinct biological states; only the genes that have passed stringent statistical criteria are considered in the second stage of the process, which involves ranking genes based on statistical as well as biological significance. We introduce a "gene signature progression" method as a standard procedure in connectivity mapping. Starting from the highest ranked gene, we progressively determine the minimum length of the gene signature that allows connections to the reference profiles (drugs) being established with a preset target false discovery rate. We use a lung cancer dataset and a breast cancer dataset as two case studies to demonstrate how this standardized procedure works, and we show that highly relevant and interesting biological connections are returned. Of particular note is gefitinib, identified as among the candidate therapeutics in our lung cancer case study. Our gene signature was based on gene expression data from Taiwan female non-smoker lung cancer patients, while there is evidence from independent studies that gefitinib is highly effective in treating women, non-smoker or former light smoker, advanced non-small cell lung cancer patients of Asian origin. In summary, we introduced a gene signature progression method into connectivity mapping, which enables a standardized procedure for constructing high quality gene signatures. This progression method is particularly useful when the number of differentially expressed genes identified is large, and when there is a need to prioritize them to be included in the query signature. The results from two case studies demonstrate that the approach we have developed is capable of obtaining pertinent candidate drugs with high precision.

  12. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal

    PubMed Central

    Gao, Jianjiong; Aksoy, Bülent Arman; Dogrusoz, Ugur; Dresdner, Gideon; Gross, Benjamin; Sumer, S. Onur; Sun, Yichao; Jacobsen, Anders; Sinha, Rileen; Larsson, Erik; Cerami, Ethan; Sander, Chris; Schultz, Nikolaus

    2014-01-01

    The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. PMID:23550210

  13. CrossQuery: a web tool for easy associative querying of transcriptome data.

    PubMed

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  14. Translating standards into practice - one Semantic Web API for Gene Expression.

    PubMed

    Deus, Helena F; Prud'hommeaux, Eric; Miller, Michael; Zhao, Jun; Malone, James; Adamusiak, Tomasz; McCusker, Jim; Das, Sudeshna; Rocca Serra, Philippe; Fox, Ronan; Marshall, M Scott

    2012-08-01

    Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-throughput experiment results in formats that can be interpreted by both researchers and analytical tools. Increasing adoption of Semantic Web and Linked Data technologies for the integration of heterogeneous and distributed health care and life sciences (HCLSs) datasets has made the reuse of standards even more pressing; dynamic semantic query federation can be used for integrative bioinformatics when ontologies and identifiers are reused across data instances. We present here a methodology to integrate the results and experimental context of three different representations of microarray-based transcriptomic experiments: the Gene Expression Atlas, the W3C BioRDF task force approach to reporting Provenance of Microarray Experiments, and the HSCI blood genomics project. Our approach does not attempt to improve the expressivity of existing standards for genomics but, instead, to enable integration of existing datasets published from microarray-based transcriptomic experiments. SPARQL Construct is used to create a posteriori mappings of concepts and properties and linking rules that match entities based on query constraints. We discuss how our integrative approach can encourage reuse of the Experimental Factor Ontology (EFO) and the Ontology for Biomedical Investigations (OBIs) for the reporting of experimental context and results of gene expression studies. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. ARMOUR - A Rice miRNA: mRNA Interaction Resource.

    PubMed

    Sanan-Mishra, Neeti; Tripathi, Anita; Goswami, Kavita; Shukla, Rohit N; Vasudevan, Madavan; Goswami, Hitesh

    2018-01-01

    ARMOUR was developed as A Rice miRNA:mRNA interaction resource. This informative and interactive database includes the experimentally validated expression profiles of miRNAs under different developmental and abiotic stress conditions across seven Indian rice cultivars. This comprehensive database covers 689 known and 1664 predicted novel miRNAs and their expression profiles in more than 38 different tissues or conditions along with their predicted/known target transcripts. The understanding of miRNA:mRNA interactome in regulation of functional cellular machinery is supported by the sequence information of the mature and hairpin structures. ARMOUR provides flexibility to users in querying the database using multiple ways like known gene identifiers, gene ontology identifiers, KEGG identifiers and also allows on the fly fold change analysis and sequence search query with inbuilt BLAST algorithm. ARMOUR database provides a cohesive platform for novel and mature miRNAs and their expression in different experimental conditions and allows searching for their interacting mRNA targets, GO annotation and their involvement in various biological pathways. The ARMOUR database includes a provision for adding more experimental data from users, with an aim to develop it as a platform for sharing and comparing experimental data contributed by research groups working on rice.

  16. GExplore: a web server for integrated queries of protein domains, gene expression and mutant phenotypes

    PubMed Central

    2009-01-01

    Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126

  17. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-08

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. The Gene Expression Omnibus Database.

    PubMed

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.

  19. The Gene Expression Omnibus database

    PubMed Central

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome–protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/. PMID:27008011

  20. HBVPathDB: a database of HBV infection-related molecular interaction network.

    PubMed

    Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi

    2005-03-21

    To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.

  1. An RNA-Seq based gene expression atlas of the common bean.

    PubMed

    O'Rourke, Jamie A; Iniguez, Luis P; Fu, Fengli; Bucciarelli, Bruna; Miller, Susan S; Jackson, Scott A; McClean, Philip E; Li, Jun; Dai, Xinbin; Zhao, Patrick X; Hernandez, Georgina; Vance, Carroll P

    2014-10-06

    Common bean (Phaseolus vulgaris) is grown throughout the world and comprises roughly 50% of the grain legumes consumed worldwide. Despite this, genetic resources for common beans have been lacking. Next generation sequencing, has facilitated our investigation of the gene expression profiles associated with biologically important traits in common bean. An increased understanding of gene expression in common bean will improve our understanding of gene expression patterns in other legume species. Combining recently developed genomic resources for Phaseolus vulgaris, including predicted gene calls, with RNA-Seq technology, we measured the gene expression patterns from 24 samples collected from seven tissues at developmentally important stages and from three nitrogen treatments. Gene expression patterns throughout the plant were analyzed to better understand changes due to nodulation, seed development, and nitrogen utilization. We have identified 11,010 genes differentially expressed with a fold change ≥ 2 and a P-value < 0.05 between different tissues at the same time point, 15,752 genes differentially expressed within a tissue due to changes in development, and 2,315 genes expressed only in a single tissue. These analyses identified 2,970 genes with expression patterns that appear to be directly dependent on the source of available nitrogen. Finally, we have assembled this data in a publicly available database, The Phaseolus vulgaris Gene Expression Atlas (Pv GEA), http://plantgrn.noble.org/PvGEA/ . Using the website, researchers can query gene expression profiles of their gene of interest, search for genes expressed in different tissues, or download the dataset in a tabular form. These data provide the basis for a gene expression atlas, which will facilitate functional genomic studies in common bean. Analysis of this dataset has identified genes important in regulating seed composition and has increased our understanding of nodulation and impact of the nitrogen source on assimilation and distribution throughout the plant.

  2. Identification of Cell Cycle-Regulated Genes by Convolutional Neural Network.

    PubMed

    Liu, Chenglin; Cui, Peng; Huang, Tao

    2017-01-01

    The cell cycle-regulated genes express periodically with the cell cycle stages, and the identification and study of these genes can provide a deep understanding of the cell cycle process. Large false positives and low overlaps are big problems in cell cycle-regulated gene detection. Here, a computational framework called DLGene was proposed for cell cycle-regulated gene detection. It is based on the convolutional neural network, a deep learning algorithm representing raw form of data pattern without assumption of their distribution. First, the expression data was transformed to categorical state data to denote the changing state of gene expression, and four different expression patterns were revealed for the reported cell cycle-regulated genes. Then, DLGene was applied to discriminate the non-cell cycle gene and the four subtypes of cell cycle genes. Its performances were compared with six traditional machine learning methods. At last, the biological functions of representative cell cycle genes for each subtype are analyzed. Our method showed better and more balanced performance of sensitivity and specificity comparing to other machine learning algorithms. The cell cycle genes had very different expression pattern with non-cell cycle genes and among the cell-cycle genes, there were four subtypes. Our method not only detects the cell cycle genes, but also describes its expression pattern, such as when its highest expression level is reached and how it changes with time. For each type, we analyzed the biological functions of the representative genes and such results provided novel insight to the cell cycle mechanisms. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  3. An expression database for roots of the model legume Medicago truncatula under salt stress

    PubMed Central

    2009-01-01

    Background Medicago truncatula is a model legume whose genome is currently being sequenced by an international consortium. Abiotic stresses such as salt stress limit plant growth and crop productivity, including those of legumes. We anticipate that studies on M. truncatula will shed light on other economically important legumes across the world. Here, we report the development of a database called MtED that contains gene expression profiles of the roots of M. truncatula based on time-course salt stress experiments using the Affymetrix Medicago GeneChip. Our hope is that MtED will provide information to assist in improving abiotic stress resistance in legumes. Description The results of our microarray experiment with roots of M. truncatula under 180 mM sodium chloride were deposited in the MtED database. Additionally, sequence and annotation information regarding microarray probe sets were included. MtED provides functional category analysis based on Gene and GeneBins Ontology, and other Web-based tools for querying and retrieving query results, browsing pathways and transcription factor families, showing metabolic maps, and comparing and visualizing expression profiles. Utilities like mapping probe sets to genome of M. truncatula and In-Silico PCR were implemented by BLAT software suite, which were also available through MtED database. Conclusion MtED was built in the PHP script language and as a MySQL relational database system on a Linux server. It has an integrated Web interface, which facilitates ready examination and interpretation of the results of microarray experiments. It is intended to help in selecting gene markers to improve abiotic stress resistance in legumes. MtED is available at http://bioinformatics.cau.edu.cn/MtED/. PMID:19906315

  4. An expression database for roots of the model legume Medicago truncatula under salt stress.

    PubMed

    Li, Daofeng; Su, Zhen; Dong, Jiangli; Wang, Tao

    2009-11-11

    Medicago truncatula is a model legume whose genome is currently being sequenced by an international consortium. Abiotic stresses such as salt stress limit plant growth and crop productivity, including those of legumes. We anticipate that studies on M. truncatula will shed light on other economically important legumes across the world. Here, we report the development of a database called MtED that contains gene expression profiles of the roots of M. truncatula based on time-course salt stress experiments using the Affymetrix Medicago GeneChip. Our hope is that MtED will provide information to assist in improving abiotic stress resistance in legumes. The results of our microarray experiment with roots of M. truncatula under 180 mM sodium chloride were deposited in the MtED database. Additionally, sequence and annotation information regarding microarray probe sets were included. MtED provides functional category analysis based on Gene and GeneBins Ontology, and other Web-based tools for querying and retrieving query results, browsing pathways and transcription factor families, showing metabolic maps, and comparing and visualizing expression profiles. Utilities like mapping probe sets to genome of M. truncatula and In-Silico PCR were implemented by BLAT software suite, which were also available through MtED database. MtED was built in the PHP script language and as a MySQL relational database system on a Linux server. It has an integrated Web interface, which facilitates ready examination and interpretation of the results of microarray experiments. It is intended to help in selecting gene markers to improve abiotic stress resistance in legumes. MtED is available at http://bioinformatics.cau.edu.cn/MtED/.

  5. Processing SPARQL queries with regular expressions in RDF databases

    PubMed Central

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  6. Processing SPARQL queries with regular expressions in RDF databases.

    PubMed

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  7. A novel multi-network approach reveals tissue-specific cellular modulators of fibrosis in systemic sclerosis.

    PubMed

    Taroni, Jaclyn N; Greene, Casey S; Martyanov, Viktor; Wood, Tammara A; Christmann, Romy B; Farber, Harrison W; Lafyatis, Robert A; Denton, Christopher P; Hinchcliff, Monique E; Pioli, Patricia A; Mahoney, J Matthew; Whitfield, Michael L

    2017-03-23

    Systemic sclerosis (SSc) is a multi-organ autoimmune disease characterized by skin fibrosis. Internal organ involvement is heterogeneous. It is unknown whether disease mechanisms are common across all involved affected tissues or if each manifestation has a distinct underlying pathology. We used consensus clustering to compare gene expression profiles of biopsies from four SSc-affected tissues (skin, lung, esophagus, and peripheral blood) from patients with SSc, and the related conditions pulmonary fibrosis (PF) and pulmonary arterial hypertension, and derived a consensus disease-associate signature across all tissues. We used this signature to query tissue-specific functional genomic networks. We performed novel network analyses to contrast the skin and lung microenvironments and to assess the functional role of the inflammatory and fibrotic genes in each organ. Lastly, we tested the expression of macrophage activation state-associated gene sets for enrichment in skin and lung using a Wilcoxon rank sum test. We identified a common pathogenic gene expression signature-an immune-fibrotic axis-indicative of pro-fibrotic macrophages (MØs) in multiple tissues (skin, lung, esophagus, and peripheral blood mononuclear cells) affected by SSc. While the co-expression of these genes is common to all tissues, the functional consequences of this upregulation differ by organ. We used this disease-associated signature to query tissue-specific functional genomic networks to identify common and tissue-specific pathologies of SSc and related conditions. In contrast to skin, in the lung-specific functional network we identify a distinct lung-resident MØ signature associated with lipid stimulation and alternative activation. In keeping with our network results, we find distinct MØ alternative activation transcriptional programs in SSc-associated PF lung and in the skin of patients with an "inflammatory" SSc gene expression signature. Our results suggest that the innate immune system is central to SSc disease processes but that subtle distinctions exist between tissues. Our approach provides a framework for examining molecular signatures of disease in fibrosis and autoimmune diseases and for leveraging publicly available data to understand common and tissue-specific disease processes in complex human diseases.

  8. Glioma IL13Rα2 Is Associated with Mesenchymal Signature Gene Expression and Poor Patient Prognosis

    PubMed Central

    Starr, Renate; Deng, Xutao; Badie, Behnam; Yuan, Yate-Ching; Forman, Stephen J.; Barish, Michael E.

    2013-01-01

    A major challenge for successful immunotherapy against glioma is the identification and characterization of validated targets. We have taken a bioinformatics approach towards understanding the biological context of IL-13 receptor α2 (IL13Rα2) expression in brain tumors, and its functional significance for patient survival. Querying multiple gene expression databases, we show that IL13Rα2 expression increases with glioma malignancy grade, and expression for high-grade tumors is bimodal, with approximately 58% of WHO grade IV gliomas over-expressing this receptor. By several measures, IL13Rα2 expression in patient samples and low-passage primary glioma lines most consistently correlates with the expression of signature genes defining mesenchymal subclass tumors and negatively correlates with proneural signature genes as defined by two studies. Positive associations were also noted with proliferative signature genes, whereas no consistent associations were found with either classical or neural signature genes. Probing the potential functional consequences of this mesenchymal association through IPA analysis suggests that IL13Rα2 expression is associated with activation of proinflammatory and immune pathways characteristic of mesenchymal subclass tumors. In addition, survival analyses indicate that IL13Rα2 over-expression is associated with poor patient prognosis, a single gene correlation ranking IL13Rα2 in the top ~1% of total gene expression probes with regard to survival association with WHO IV gliomas. This study better defines the functional consequences of IL13Rα2 expression by demonstrating association with mesenchymal signature gene expression and poor patient prognosis. It thus highlights the utility of IL13Rα2 as a therapeutic target, and helps define patient populations most likely to respond to immunotherapy in present and future clinical trials. PMID:24204956

  9. Glioma IL13Rα2 is associated with mesenchymal signature gene expression and poor patient prognosis.

    PubMed

    Brown, Christine E; Warden, Charles D; Starr, Renate; Deng, Xutao; Badie, Behnam; Yuan, Yate-Ching; Forman, Stephen J; Barish, Michael E

    2013-01-01

    A major challenge for successful immunotherapy against glioma is the identification and characterization of validated targets. We have taken a bioinformatics approach towards understanding the biological context of IL-13 receptor α2 (IL13Rα2) expression in brain tumors, and its functional significance for patient survival. Querying multiple gene expression databases, we show that IL13Rα2 expression increases with glioma malignancy grade, and expression for high-grade tumors is bimodal, with approximately 58% of WHO grade IV gliomas over-expressing this receptor. By several measures, IL13Rα2 expression in patient samples and low-passage primary glioma lines most consistently correlates with the expression of signature genes defining mesenchymal subclass tumors and negatively correlates with proneural signature genes as defined by two studies. Positive associations were also noted with proliferative signature genes, whereas no consistent associations were found with either classical or neural signature genes. Probing the potential functional consequences of this mesenchymal association through IPA analysis suggests that IL13Rα2 expression is associated with activation of proinflammatory and immune pathways characteristic of mesenchymal subclass tumors. In addition, survival analyses indicate that IL13Rα2 over-expression is associated with poor patient prognosis, a single gene correlation ranking IL13Rα2 in the top ~1% of total gene expression probes with regard to survival association with WHO IV gliomas. This study better defines the functional consequences of IL13Rα2 expression by demonstrating association with mesenchymal signature gene expression and poor patient prognosis. It thus highlights the utility of IL13Rα2 as a therapeutic target, and helps define patient populations most likely to respond to immunotherapy in present and future clinical trials.

  10. Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM: design and application.

    PubMed

    Xirasagar, Sandhya; Gustafson, Scott F; Huang, Cheng-Cheng; Pan, Qinyan; Fostel, Jennifer; Boyer, Paul; Merrick, B Alex; Tomer, Kenneth B; Chan, Denny D; Yost, Kenneth J; Choi, Danielle; Xiao, Nianqing; Stasiewicz, Stanley; Bushel, Pierre; Waters, Michael D

    2006-04-01

    The CEBS data repository is being developed to promote a systems biology approach to understand the biological effects of environmental stressors. CEBS will house data from multiple gene expression platforms (transcriptomics), protein expression and protein-protein interaction (proteomics), and changes in low molecular weight metabolite levels (metabolomics) aligned by their detailed toxicological context. The system will accommodate extensive complex querying in a user-friendly manner. CEBS will store toxicological contexts including the study design details, treatment protocols, animal characteristics and conventional toxicological endpoints such as histopathology findings and clinical chemistry measures. All of these data types can be integrated in a seamless fashion to enable data query and analysis in a biologically meaningful manner. An object model, the SysBio-OM (Xirasagar et al., 2004) has been designed to facilitate the integration of microarray gene expression, proteomics and metabolomics data in the CEBS database system. We now report SysTox-OM as an open source systems toxicology model designed to integrate toxicological context into gene expression experiments. The SysTox-OM model is comprehensive and leverages other open source efforts, namely, the Standard for Exchange of Nonclinical Data (http://www.cdisc.org/models/send/v2/index.html) which is a data standard for capturing toxicological information for animal studies and Clinical Data Interchange Standards Consortium (http://www.cdisc.org/models/sdtm/index.html) that serves as a standard for the exchange of clinical data. Such standardization increases the accuracy of data mining, interpretation and exchange. The open source SysTox-OM model, which can be implemented on various software platforms, is presented here. A universal modeling language (UML) depiction of the entire SysTox-OM is available at http://cebs.niehs.nih.gov and the Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. Currently, the public toxicological data in CEBS can be queried via a web application based on the SysTox-OM at http://cebs.niehs.nih.gov xirasagars@saic.com Supplementary data are available at Bioinformatics online.

  11. Analysis of multiplex gene expression maps obtained by voxelation.

    PubMed

    An, Li; Xie, Hongbo; Chin, Mark H; Obradovic, Zoran; Smith, Desmond J; Megalooikonomou, Vasileios

    2009-04-29

    Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum. The experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists.

  12. Identification, mRNA expression and functional analysis of several yellow family genes in Tribolium castaneum

    USDA-ARS?s Scientific Manuscript database

    Querying the genome of the red flour beetle, Tribolium castaneum, with the Drosophila melanogaster Yellow-y (DmY-y) protein sequence identified fourteen Yellow homologs. One of these is an ortholog of DmY-y, which is required for cuticular pigmentation (melanization), and another is a ortholog of D...

  13. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  14. Pernicious plans revealed: Plasmodium falciparum genome wide expression analysis.

    PubMed

    Llinás, Manuel; DeRisi, Joseph L

    2004-08-01

    The asexual intraerythrocytic developmental cycle (IDC) of Plasmodium falciparum is responsible for the majority of the clinical manifestations of malaria in humans. Although malaria has been studied for over a century, the elucidation of the full genome sequence of P. falciparum has now allowed for in-depth studies of gene expression throughout the entire intraerythrocytic stage. As the mainstays of anti-malarial chemotherapy become increasingly ineffective, we need a deeper understanding of fundamental plasmodial bioregulatory mechanisms to successfully subvert them. Recent gene expression studies have begun to examine different aspects of the IDC and are providing key insights into the basic mechanisms of Plasmodium gene regulation and are helping to define gene functions. However, to date, no transcription factor has been fully characterized from Plasmodium and the definitive identification of cis-acting regulatory elements along with their corresponding trans-acting partners is still lacking. The characterization of the transcriptome of P. falciparum is the first major step towards the understanding of the genome wide regulation of gene expression in this parasite. IDC expression data for almost every gene in the P. falciparum genome can now be publicly queried at and. The results of these studies suggest promising leads for identifying novel targets for anti-malarial therapeutics and vaccines in addition to providing a solid foundation for the ongoing elucidation of plasmodial gene expression.

  15. Xenopus microRNA genes are predominantly located within introns and are differentially expressed in adult frog tissues via post-transcriptional regulation

    PubMed Central

    Tang, Guo-Qing; Maxwell, E. Stuart

    2008-01-01

    The amphibian Xenopus provides a model organism for investigating microRNA expression during vertebrate embryogenesis and development. Searching available Xenopus genome databases using known human pre-miRNAs as query sequences, more than 300 genes encoding 142 Xenopus tropicalis miRNAs were identified. Analysis of Xenopus tropicalis miRNA genes revealed a predominate positioning within introns of protein-coding and nonprotein-coding RNA Pol II-transcribed genes. MiRNA genes were also located in pre-mRNA exons and positioned intergenically between known protein-coding genes. Many miRNA species were found in multiple locations and in more than one genomic context. MiRNA genes were also clustered throughout the genome, indicating the potential for the cotranscription and coordinate expression of miRNAs located in a given cluster. Northern blot analysis confirmed the expression of many identified miRNAs in both X. tropicalis and X. laevis. Comparison of X. tropicalis and X. laevis blots revealed comparable expression profiles, although several miRNAs exhibited species-specific expression in different tissues. More detailed analysis revealed that for some miRNAs, the tissue-specific expression profile of the pri-miRNA precursor was distinctly different from that of the mature miRNA profile. Differential miRNA precursor processing in both the nucleus and cytoplasm was implicated in the observed tissue-specific differences. These observations indicated that post-transcriptional processing plays an important role in regulating miRNA expression in the amphibian Xenopus. PMID:18032731

  16. Targeted exploration and analysis of large cross-platform human transcriptomic compendia

    PubMed Central

    Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

    2016-01-01

    We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801

  17. From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data

    NASA Astrophysics Data System (ADS)

    Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.

    2007-11-01

    Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.

  18. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-12

    Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions

  19. mirEX: a platform for comparative exploration of plant pri-miRNA expression data.

    PubMed

    Bielewicz, Dawid; Dolata, Jakub; Zielezinski, Andrzej; Alaba, Sylwia; Szarzynska, Bogna; Szczesniak, Michal W; Jarmolowski, Artur; Szweykowska-Kulinska, Zofia; Karlowski, Wojciech M

    2012-01-01

    mirEX is a comprehensive platform for comparative analysis of primary microRNA expression data. RT-qPCR-based gene expression profiles are stored in a universal and expandable database scheme and wrapped by an intuitive user-friendly interface. A new way of accessing gene expression data in mirEX includes a simple mouse operated querying system and dynamic graphs for data mining analyses. In contrast to other publicly available databases, the mirEX interface allows a simultaneous comparison of expression levels between various microRNA genes in diverse organs and developmental stages. Currently, mirEX integrates information about the expression profile of 190 Arabidopsis thaliana pri-miRNAs in seven different developmental stages: seeds, seedlings and various organs of mature plants. Additionally, by providing RNA structural models, publicly available deep sequencing results, experimental procedure details and careful selection of auxiliary data in the form of web links, mirEX can function as a one-stop solution for Arabidopsis microRNA information. A web-based mirEX interface can be accessed at http://bioinfo.amu.edu.pl/mirex.

  20. Epididymal genomics and the search for a male contraceptive.

    PubMed

    Turner, T T; Johnston, D S; Jelinsky, S A

    2006-05-16

    This report represents the joint efforts of three laboratories, one with a primary interest in understanding regulatory processes in the epididymal epithelium (TTT) and two with a primary interest in identifying and characterizing new contraceptive targets (DSJ and SAJ). We have developed a highly refined mouse epididymal transcriptome and have used it as a starting point for determining genes in the human epididymis, which may serve as targets for male contraceptives. Our database represents gene expression information for approximately 39,000 transcripts, of which over 17,000 are significantly expressed in at least one segment of the mouse epididymis. Over 2000 of these transcripts are up- or down-regulated by at least four-fold between at least two segments. In addition, human databases have been queried to determine expression of orthologs in the human epididymis and the specificity of their expression in the epididymis. Genes highly regulated in the human epididymis and showing high tissue specificity are potential targets for male contraceptives.

  1. Virus-Based RNA Silencing Agents and Virus-Derived Expression Vectors as Gene Therapy Vehicles.

    PubMed

    Venkataraman, Srividhya; Ahmad, Tauqeer; AbouHaidar, Mounir G; Hefferon, Kathleen L

    2017-01-01

    In consideration of recent developments in understanding the genomics and proteomics of viruses, the use of viral DNA / RNA sequences as well as their gene expression schemes, have found new in-roads towards the prognosis and therapy of diseases. Correspondingly, the sphere of the patenting scenario has expanded significantly. The current review addresses patented inventions concerning the use of virus sequences as gene silencing machineries and inventions concerning the generation and application of viral sequences as expression vectors. Furthermore, this review also discusses the employment of these patents for clinical, agricultural and biotechnological applications. Considering these objectives, the Delphion Research Intellectual Property Network database was searched using keywords such as "gene silencing", "engineered viruses" and "expression vectors" and descriptions of recent patents on the said topics were discussed. Despite several recent advances in the use of viruses as disease therapy vehicles and biotechnological vectors, these developments have yet to be proven effective in practice, in clinical and field trials. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Rapid in silico cloning of genes using expressed sequence tags (ESTs).

    PubMed

    Gill, R W; Sanseau, P

    2000-01-01

    Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.

  3. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-01

    unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in

  4. CellAtlasSearch: a scalable search engine for single cells.

    PubMed

    Srivastava, Divyanshu; Iyer, Arvind; Kumar, Vibhor; Sengupta, Debarka

    2018-05-21

    Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.

  5. Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query.

    PubMed

    Nadkarni, P M

    1997-08-01

    Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.

  6. MANTIS: a phylogenetic framework for multi-species genome comparisons.

    PubMed

    Tzika, Athanasia C; Helaers, Raphaël; Van de Peer, Yves; Milinkovitch, Michel C

    2008-01-15

    Practitioners of comparative genomics face huge analytical challenges as whole genome sequences and functional/expression data accumulate. Furthermore, the field would greatly benefit from a better integration of this wealth of data with evolutionary concepts. Here, we present MANTIS, a relational database for the analysis of (i) gains and losses of genes on specific branches of the metazoan phylogeny, (ii) reconstructed genome content of ancestral species and (iii) over- or under-representation of functions/processes and tissue specificity of gained, duplicated and lost genes. MANTIS estimates the most likely positions of gene losses on the true phylogeny using a maximum-likelihood function. A user-friendly interface and an extensive query system allow to investigate questions pertaining to gene identity, phylogenetic mapping and function/expression parameters. MANTIS is freely available at http://www.mantisdb.org and constitutes the missing link between multi-species genome comparisons and functional analyses.

  7. Directed module detection in a large-scale expression compendium.

    PubMed

    Fu, Qiang; Lemmens, Karen; Sanchez-Rodriguez, Aminael; Thijs, Inge M; Meysman, Pieter; Sun, Hong; Fierro, Ana Carolina; Engelen, Kristof; Marchal, Kathleen

    2012-01-01

    Public online microarray databases contain tremendous amounts of expression data. Mining these data sources can provide a wealth of information on the underlying transcriptional networks. In this chapter, we illustrate how the web services COLOMBOS and DISTILLER can be used to identify condition-dependent coexpression modules by exploring compendia of public expression data. COLOMBOS is designed for user-specified query-driven analysis, whereas DISTILLER generates a global regulatory network overview. The user is guided through both web services by means of a case study in which condition-dependent coexpression modules comprising a gene of interest (i.e., "directed") are identified.

  8. A biomarker-based screen of a gene expression compendium ...

    EPA Pesticide Factsheets

    Computational approaches were developed to identify factors that regulate Nrf2 in a large gene expression compendium of microarray profiles including >2000 comparisons which queried the effects of chemicals, genes, diets, and infectious agents on gene expression in the mouse liver. A gene expression biomarker of 48 genes which accurately predicted Nrf2 activation was used to identify factors which resulted in a gene expression profile with significant correlation to the biomarker. A number of novel insights were made. Chemicals that activated the xenosensor constitutive activated receptor (CAR) consistently activated Nrf2 across hundreds of profiles, possibly downstream of Cyp-induced increases in oxidative stress. Nrf2 activation was also found to be negatively regulated by the growth hormone (GH)- and androgen-regulated transcription factor STAT5b, a transcription factor suppressed by CAR. Nrf2 was activated when STAT5b was suppressed in female mice vs. male mice, after exposure to estrogens, or in genetic mutants in which GH signaling was disrupted. A subset of the mutants that show STAT5b suppression and Nrf2 activation result in increased resistance to environmental stressors and increased longevity. This study describes a novel approach for understanding the network of factors that regulate the Nrf2 pathway and highlights novel interactions between Nrf2, CAR and STAT5b transcription factors. (This abstract does not represent EPA policy.) Computational appr

  9. EMAGE mouse embryo spatial gene expression database: 2010 update

    PubMed Central

    Richardson, Lorna; Venkataraman, Shanmugasundaram; Stevenson, Peter; Yang, Yiya; Burton, Nicholas; Rao, Jianguo; Fisher, Malcolm; Baldock, Richard A.; Davidson, Duncan R.; Christiansen, Jeffrey H.

    2010-01-01

    EMAGE (http://www.emouseatlas.org/emage) is a freely available online database of in situ gene expression patterns in the developing mouse embryo. Gene expression domains from raw images are extracted and integrated spatially into a set of standard 3D virtual mouse embryos at different stages of development, which allows data interrogation by spatial methods. An anatomy ontology is also used to describe sites of expression, which allows data to be queried using text-based methods. Here, we describe recent enhancements to EMAGE including: the release of a completely re-designed website, which offers integration of many different search functions in HTML web pages, improved user feedback and the ability to find similar expression patterns at the click of a button; back-end refactoring from an object oriented to relational architecture, allowing associated SQL access; and the provision of further access by standard formatted URLs and a Java API. We have also increased data coverage by sourcing from a greater selection of journals and developed automated methods for spatial data annotation that are being applied to spatially incorporate the genome-wide (∼19 000 gene) ‘EURExpress’ dataset into EMAGE. PMID:19767607

  10. Generation of a foveomacular transcriptome

    PubMed Central

    Bernstein, Steven; Wong, Paul W.

    2014-01-01

    Purpose Organizing molecular biologic data is a growing challenge since the rate of data accumulation is steadily increasing. Information relevant to a particular biologic query can be difficult to extract from the comprehensive databases currently available. We present a data collection and organization model designed to ameliorate these problems and applied it to generate an expressed sequence tag (EST)–based foveomacular transcriptome. Methods Using Perl, MySQL, EST libraries, screening, and human foveomacular gene expression as a model system, we generated a foveomacular transcriptome database enriched for molecularly relevant data. Results Using foveomacula as a gene expression model tissue, we identified and organized 6,056 genes expressed in that tissue. Of those identified genes, 3,480 had not been previously described as expressed in the foveomacula. Internal experimental controls as well as comparison of our data set to published data sets suggest we do not yet have a complete description of the foveomacula transcriptome. Conclusions We present an organizational method designed to amplify the utility of data pertinent to a specific research interest. Our method is generic enough to be applicable to a variety of conditions yet focused enough to allow for specialized study. PMID:24991187

  11. Kangaroo – A pattern-matching program for biological sequences

    PubMed Central

    2002-01-01

    Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718

  12. Retrieving relevant time-course experiments: a study on Arabidopsis microarrays.

    PubMed

    Şener, Duygu Dede; Oğul, Hasan

    2016-06-01

    Understanding time-course regulation of genes in response to a stimulus is a major concern in current systems biology. The problem is usually approached by computational methods to model the gene behaviour or its networked interactions with the others by a set of latent parameters. The model parameters can be estimated through a meta-analysis of available data obtained from other relevant experiments. The key question here is how to find the relevant experiments which are potentially useful in analysing current data. In this study, the authors address this problem in the context of time-course gene expression experiments from an information retrieval perspective. To this end, they introduce a computational framework that takes a time-course experiment as a query and reports a list of relevant experiments retrieved from a given repository. These retrieved experiments can then be used to associate the environmental factors of query experiment with the findings previously reported. The model is tested using a set of time-course Arabidopsis microarrays. The experimental results show that relevant experiments can be successfully retrieved based on content similarity.

  13. Gene Expression Profiling of Evening Fatigue in Women Undergoing Chemotherapy for Breast Cancer

    PubMed Central

    Kober, Kord M.; Dunn, Laura; Mastick, Judy; Cooper, Bruce; Langford, Dale; Melisko, Michelle; Venook, Alan; Chen, Lee-May; Wright, Fay; Hammer, Marilyn J.; Schmidt, Brian L.; Levine, Jon; Miaskowski, Christine; Aouizerat, Bradley E.

    2017-01-01

    Moderate to severe fatigue occurs in 14% to 96% of oncology patients undergoing active treatment. Current interventions for fatigue are not efficacious. A major impediment to the development of effective treatments is a lack of understanding of the fundamental mechanisms underlying fatigue. In the current study, differences in phenotypic characteristics and gene expression profiles were evaluated in a sample of breast cancer patients undergoing chemotherapy (CTX) who reported low (n=19) and high (n=25) levels of evening fatigue. Compared to the low group, patients in the high evening fatigue group reported lower functional status scores, higher comorbidity scores, and fewer prior cancer treatments. One gene was identified as up-regulated and eleven genes were identified to be down-regulated in the high evening fatigue group. Gene set analysis found 24 down-regulated and 94 simultaneously up and down perturbed pathways between the two fatigue groups. Transcript Origin Analysis found that differential expression originated primarily from monocytes and dendritic cell types. Query of public data sources found 18 gene expression experiments with similar differential expression profiles. Our analyses revealed that inflammation, neurotransmitter regulation, and energy metabolism are likely mechanisms associated with evening fatigue severity; that CTX may contribute to fatigue seen in oncology patients; and that the patterns of gene expression may be shared with other models of fatigue (e.g., physical exercise, pathogen-induced sickness behavior). These results suggest that the mechanisms that underlie fatigue in oncology patients are multi-factorial. PMID:26957308

  14. VTCdb: a gene co-expression database for the crop species Vitis vinifera (grapevine).

    PubMed

    Wong, Darren C J; Sweetman, Crystal; Drew, Damian P; Ford, Christopher M

    2013-12-16

    Gene expression datasets in model plants such as Arabidopsis have contributed to our understanding of gene function and how a single underlying biological process can be governed by a diverse network of genes. The accumulation of publicly available microarray data encompassing a wide range of biological and environmental conditions has enabled the development of additional capabilities including gene co-expression analysis (GCA). GCA is based on the understanding that genes encoding proteins involved in similar and/or related biological processes may exhibit comparable expression patterns over a range of experimental conditions, developmental stages and tissues. We present an open access database for the investigation of gene co-expression networks within the cultivated grapevine, Vitis vinifera. The new gene co-expression database, VTCdb (http://vtcdb.adelaide.edu.au/Home.aspx), offers an online platform for transcriptional regulatory inference in the cultivated grapevine. Using condition-independent and condition-dependent approaches, grapevine co-expression networks were constructed using the latest publicly available microarray datasets from diverse experimental series, utilising the Affymetrix Vitis vinifera GeneChip (16 K) and the NimbleGen Grape Whole-genome microarray chip (29 K), thus making it possible to profile approximately 29,000 genes (95% of the predicted grapevine transcriptome). Applications available with the online platform include the use of gene names, probesets, modules or biological processes to query the co-expression networks, with the option to choose between Affymetrix or Nimblegen datasets and between multiple co-expression measures. Alternatively, the user can browse existing network modules using interactive network visualisation and analysis via CytoscapeWeb. To demonstrate the utility of the database, we present examples from three fundamental biological processes (berry development, photosynthesis and flavonoid biosynthesis) whereby the recovered sub-networks reconfirm established plant gene functions and also identify novel associations. Together, we present valuable insights into grapevine transcriptional regulation by developing network models applicable to researchers in their prioritisation of gene candidates, for on-going study of biological processes related to grapevine development, metabolism and stress responses.

  15. Seasonal Changes in Bacterial and Archaeal Gene Expression Patterns across Salinity Gradients in the Columbia River Coastal Margin

    PubMed Central

    Smith, Maria W.; Herfort, Lydie; Tyrol, Kaitlin; Suciu, Dominic; Campbell, Victoria; Crump, Byron C.; Peterson, Tawnya D.; Zuber, Peter; Baptista, Antonio M.; Simon, Holly M.

    2010-01-01

    Through their metabolic activities, microbial populations mediate the impact of high gradient regions on ecological function and productivity of the highly dynamic Columbia River coastal margin (CRCM). A 2226-probe oligonucleotide DNA microarray was developed to investigate expression patterns for microbial genes involved in nitrogen and carbon metabolism in the CRCM. Initial experiments with the environmental microarrays were directed toward validation of the platform and yielded high reproducibility in multiple tests. Bioinformatic and experimental validation also indicated that >85% of the microarray probes were specific for their corresponding target genes and for a few homologs within the same microbial family. The validated probe set was used to query gene expression responses by microbial assemblages to environmental variability. Sixty-four samples from the river, estuary, plume, and adjacent ocean were collected in different seasons and analyzed to correlate the measured variability in chemical, physical and biological water parameters to differences in global gene expression profiles. The method produced robust seasonal profiles corresponding to pre-freshet spring (April) and late summer (August). Overall relative gene expression was high in both seasons and was consistent with high microbial abundance measured by total RNA, heterotrophic bacterial production, and chlorophyll a. Both seasonal patterns involved large numbers of genes that were highly expressed relative to background, yet each produced very different gene expression profiles. April patterns revealed high differential gene expression in the coastal margin samples (estuary, plume and adjacent ocean) relative to freshwater, while little differential gene expression was observed along the river-to-ocean transition in August. Microbial gene expression profiles appeared to relate, in part, to seasonal differences in nutrient availability and potential resource competition. Furthermore, our results suggest that highly-active particle-attached microbiota in the Columbia River water column may perform dissimilatory nitrate reduction (both dentrification and DNRA) within anoxic particle microniches. PMID:20967204

  16. yStreX: yeast stress expression database

    PubMed Central

    Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina

    2014-01-01

    Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351

  17. Validation of reference genes for quantitative RT-PCR studies of gene expression in perennial ryegrass (Lolium perenne L.)

    PubMed Central

    2010-01-01

    Background Perennial ryegrass (Lolium perenne L.) is an important pasture and turf crop. Biotechniques such as gene expression studies are being employed to improve traits in this temperate grass. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) is among the best methods available for determining changes in gene expression. Before analysis of target gene expression, it is essential to select an appropriate normalisation strategy to control for non-specific variation between samples. Reference genes that have stable expression at different biological and physiological states can be effectively used for normalisation; however, their expression stability must be validated before use. Results Existing Serial Analysis of Gene Expression data were queried to identify six moderately expressed genes that had relatively stable gene expression throughout the year. These six candidate reference genes (eukaryotic elongation factor 1 alpha, eEF1A; TAT-binding protein homolog 1, TBP-1; eukaryotic translation initiation factor 4 alpha, eIF4A; YT521-B-like protein family protein, YT521-B; histone 3, H3; ubiquitin-conjugating enzyme, E2) were validated for qRT-PCR normalisation in 442 diverse perennial ryegrass (Lolium perenne L.) samples sourced from field- and laboratory-grown plants under a wide range of experimental conditions. Eukaryotic EF1A is encoded by members of a multigene family exhibiting differential expression and necessitated the expression analysis of different eEF1A encoding genes; a highly expressed eEF1A (h), a moderately, but stably expressed eEF1A (s), and combined expression of multigene eEF1A (m). NormFinder identified eEF1A (s) and YT521-B as the best combination of two genes for normalisation of gene expression data in perennial ryegrass following different defoliation management in the field. Conclusions This study is unique in the magnitude of samples tested with the inclusion of numerous field-grown samples, helping pave the way to conduct gene expression studies in perennial biomass crops under field-conditions. From our study several stably expressed reference genes have been validated. This provides useful candidates for reference gene selection in perennial ryegrass under conditions other than those tested here. PMID:20089196

  18. High-performance web services for querying gene and variant annotation.

    PubMed

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  19. Strategies to explore functional genomics data sets in NCBI's GEO database.

    PubMed

    Wilhite, Stephen E; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.

  20. Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database

    PubMed Central

    Wilhite, Stephen E.; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872

  1. Male reproductive development: gene expression profiling of maize anther and pollen ontogeny

    PubMed Central

    Ma, Jiong; Skibbe, David S; Fernandes, John; Walbot, Virginia

    2008-01-01

    Background During flowering, central anther cells switch from mitosis to meiosis, ultimately forming pollen containing haploid sperm. Four rings of surrounding somatic cells differentiate to support first meiosis and later pollen dispersal. Synchronous development of many anthers per tassel and within each anther facilitates dissection of carefully staged maize anthers for transcriptome profiling. Results Global gene expression profiles of 7 stages representing 29 days of anther development are analyzed using a 44 K oligonucleotide array querying approximately 80% of maize protein-coding genes. Mature haploid pollen containing just two cell types expresses 10,000 transcripts. Anthers contain 5 major cell types and express >24,000 transcript types: each anther stage expresses approximately 10,000 constitutive and approximately 10,000 or more transcripts restricted to one or a few stages. The lowest complexity is present during meiosis. Large suites of stage-specific and co-expressed genes are identified through Gene Ontology and clustering analyses as functional classes for pre-meiotic, meiotic, and post-meiotic anther development. MADS box and zinc finger transcription factors with constitutive and stage-limited expression are identified. Conclusions We propose that the extensive gene expression of anther cells and pollen represents the key test of maize genome fitness, permitting strong selection against deleterious alleles in diploid anthers and haploid pollen. Because flowering plants show a substantial bias for male-sterile compared to female-sterile mutations, we propose that this fitness test is general. Because both somatic and germinal cells are transcriptionally quiescent during meiosis, we hypothesize that successful completion of meiosis is required to trigger maturation of anther somatic cells. PMID:19099579

  2. Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

    PubMed

    Chakraborty, Sutirtha

    2018-05-26

    RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. Copyright © 2017. Published by Elsevier Inc.

  3. Ontological Approach to Military Knowledge Modeling and Management

    DTIC Science & Technology

    2004-03-01

    federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries

  4. Detection of the inferred interaction network in hepatocellular carcinoma from EHCO (Encyclopedia of Hepatocellular Carcinoma genes Online)

    PubMed Central

    Hsu, Chun-Nan; Lai, Jin-Mei; Liu, Chia-Hung; Tseng, Huei-Hun; Lin, Chih-Yun; Lin, Kuan-Ting; Yeh, Hsu-Hua; Sung, Ting-Yi; Hsu, Wen-Lian; Su, Li-Jen; Lee, Sheng-An; Chen, Chang-Han; Lee, Gen-Cher; Lee, DT; Shiue, Yow-Ling; Yeh, Chang-Wei; Chang, Chao-Hui; Kao, Cheng-Yan; Huang, Chi-Ying F

    2007-01-01

    Background The significant advances in microarray and proteomics analyses have resulted in an exponential increase in potential new targets and have promised to shed light on the identification of disease markers and cellular pathways. We aim to collect and decipher the HCC-related genes at the systems level. Results Here, we build an integrative platform, the Encyclopedia of Hepatocellular Carcinoma genes Online, dubbed EHCO , to systematically collect, organize and compare the pileup of unsorted HCC-related studies by using natural language processing and softbots. Among the eight gene set collections, ranging across PubMed, SAGE, microarray, and proteomics data, there are 2,906 genes in total; however, more than 77% genes are only included once, suggesting that tremendous efforts need to be exerted to characterize the relationship between HCC and these genes. Of these HCC inventories, protein binding represents the largest proportion (~25%) from Gene Ontology analysis. In fact, many differentially expressed gene sets in EHCO could form interaction networks (e.g. HBV-associated HCC network) by using available human protein-protein interaction datasets. To further highlight the potential new targets in the inferred network from EHCO, we combine comparative genomics and interactomics approaches to analyze 120 evolutionary conserved and overexpressed genes in HCC. 47 out of 120 queries can form a highly interactive network with 18 queries serving as hubs. Conclusion This architectural map may represent the first step toward the attempt to decipher the hepatocarcinogenesis at the systems level. Targeting hubs and/or disruption of the network formation might reveal novel strategy for HCC treatment. PMID:17326819

  5. Chitinase-like (CTL) and cellulose synthase (CESA) gene expression in gelatinous-type cellulosic walls of flax (Linum usitatissimum L.) bast fibers.

    PubMed

    Mokshina, Natalia; Gorshkova, Tatyana; Deyholos, Michael K

    2014-01-01

    Plant chitinases (EC 3.2.1.14) and chitinase-like (CTL) proteins have diverse functions including cell wall biosynthesis and disease resistance. We analyzed the expression of 34 chitinase and chitinase-like genes of flax (collectively referred to as LusCTLs), belonging to glycoside hydrolase family 19 (GH19). Analysis of the transcript expression patterns of LusCTLs in the stem and other tissues identified three transcripts (LusCTL19, LusCTL20, LusCTL21) that were highly enriched in developing bast fibers, which form cellulose-rich gelatinous-type cell walls. The same three genes had low relative expression in tissues with primary cell walls and in xylem, which forms a xylan type of secondary cell wall. Phylogenetic analysis of the LusCTLs identified a flax-specific sub-group that was not represented in any of other genomes queried. To provide further context for the gene expression analysis, we also conducted phylogenetic and expression analysis of the cellulose synthase (CESA) family genes of flax, and found that expression of secondary wall-type LusCESAs (LusCESA4, LusCESA7 and LusCESA8) was correlated with the expression of two LusCTLs (LusCTL1, LusCTL2) that were the most highly enriched in xylem. The expression of LusCTL19, LusCTL20, and LusCTL21 was not correlated with that of any CESA subgroup. These results defined a distinct type of CTLs that may have novel functions specific to the development of the gelatinous (G-type) cellulosic walls.

  6. Dynamic Visualization of Co-expression in Systems Genetics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    New, Joshua Ryan; Huang, Jian; Chesler, Elissa J

    2008-01-01

    Biologists hope to address grand scientific challenges by exploring the abundance of data made available through modern microarray technology and other high-throughput techniques. The impact of this data, however, is limited unless researchers can effectively assimilate such complex information and integrate it into their daily research; interactive visualization tools are called for to support the effort. Specifically, typical studies of gene co-expression require novel visualization tools that enable the dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These tools should allow biologists to develop an intuitive understanding of the structure of biologicalmore » networks and discover genes which reside in critical positions in networks and pathways. By using a graph as a universal data representation of correlation in gene expression data, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool for interacting with gene co-expression data integrates techniques such as: graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized b-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using a real-world workflow from a large-scale, systems genetics study of mammalian gene co-expression.« less

  7. Distinct profiles of expressed sequence tags during intestinal regeneration in the sea cucumber Holothuria glaberrima

    PubMed Central

    Rojas-Cartagena, Carmencita; Ortíz-Pineda, Pablo; Ramírez-Gómez, Francisco; Suárez-Castillo, Edna C.; Matos-Cruz, Vanessa; Rodríguez, Carlos; Ortíz-Zuazaga, Humberto; García-Arrarás, José E.

    2010-01-01

    Repair and regeneration are key processes for tissue maintenance, and their disruption may lead to disease states. Little is known about the molecular mechanisms that underline the repair and regeneration of the digestive tract. The sea cucumber Holothuria glaberrima represents an excellent model to dissect and characterize the molecular events during intestinal regeneration. To study the gene expression profile, cDNA libraries were constructed from normal, 3-day, and 7-day regenerating intestines of H. glaberrima. Clones were randomly sequenced and queried against the nonredundant protein database at the National Center for Biotechnology Information. RT-PCR analyses were made of several genes to determine their expression profile during intestinal regeneration. A total of 5,173 sequences from three cDNA libraries were obtained. About 46.2, 35.6, and 26.2% of the sequences for the normal, 3-days, and 7-days cDNA libraries, respectively, shared significant similarity with known sequences in the protein database of GenBank but only present 10% of similarity among them. Analysis of the libraries in terms of functional processes, protein domains, and most common sequences suggests that a differential expression profile is taking place during the regeneration process. Further examination of the expressed sequence tag dataset revealed that 12 putative genes are differentially expressed at significant level (R > 6). Experimental validation by RT-PCR analysis reveals that at least three genes (unknown C-4677-1, melanotransferrin, and centaurin) present a differential expression during regeneration. These findings strongly suggest that the gene expression profile varies among regeneration stages and provide evidence for the existence of differential gene expression. PMID:17579180

  8. NCBI GEO: mining tens of millions of expression profiles--database and tools update.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Edgar, Ron

    2007-01-01

    The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at http://www.ncbi.nlm.nih.gov/geo/

  9. PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for Arabidopsis.

    PubMed

    Pan, Deyun; Sun, Ning; Cheung, Kei-Hoi; Guan, Zhong; Ma, Ligeng; Holford, Matthew; Deng, Xingwang; Zhao, Hongyu

    2003-11-07

    To date, many genomic and pathway-related tools and databases have been developed to analyze microarray data. In published web-based applications to date, however, complex pathways have been displayed with static image files that may not be up-to-date or are time-consuming to rebuild. In addition, gene expression analyses focus on individual probes and genes with little or no consideration of pathways. These approaches reveal little information about pathways that are key to a full understanding of the building blocks of biological systems. Therefore, there is a need to provide useful tools that can generate pathways without manually building images and allow gene expression data to be integrated and analyzed at pathway levels for such experimental organisms as Arabidopsis. We have developed PathMAPA, a web-based application written in Java that can be easily accessed over the Internet. An Oracle database is used to store, query, and manipulate the large amounts of data that are involved. PathMAPA allows its users to (i) upload and populate microarray data into a database; (ii) integrate gene expression with enzymes of the pathways; (iii) generate pathway diagrams without building image files manually; (iv) visualize gene expressions for each pathway at enzyme, locus, and probe levels; and (v) perform statistical tests at pathway, enzyme and gene levels. PathMAPA can be used to examine Arabidopsis thaliana gene expression patterns associated with metabolic pathways. PathMAPA provides two unique features for the gene expression analysis of Arabidopsis thaliana: (i) automatic generation of pathways associated with gene expression and (ii) statistical tests at pathway level. The first feature allows for the periodical updating of genomic data for pathways, while the second feature can provide insight into how treatments affect relevant pathways for the selected experiment(s).

  10. PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for Arabidopsis

    PubMed Central

    Pan, Deyun; Sun, Ning; Cheung, Kei-Hoi; Guan, Zhong; Ma, Ligeng; Holford, Matthew; Deng, Xingwang; Zhao, Hongyu

    2003-01-01

    Background To date, many genomic and pathway-related tools and databases have been developed to analyze microarray data. In published web-based applications to date, however, complex pathways have been displayed with static image files that may not be up-to-date or are time-consuming to rebuild. In addition, gene expression analyses focus on individual probes and genes with little or no consideration of pathways. These approaches reveal little information about pathways that are key to a full understanding of the building blocks of biological systems. Therefore, there is a need to provide useful tools that can generate pathways without manually building images and allow gene expression data to be integrated and analyzed at pathway levels for such experimental organisms as Arabidopsis. Results We have developed PathMAPA, a web-based application written in Java that can be easily accessed over the Internet. An Oracle database is used to store, query, and manipulate the large amounts of data that are involved. PathMAPA allows its users to (i) upload and populate microarray data into a database; (ii) integrate gene expression with enzymes of the pathways; (iii) generate pathway diagrams without building image files manually; (iv) visualize gene expressions for each pathway at enzyme, locus, and probe levels; and (v) perform statistical tests at pathway, enzyme and gene levels. PathMAPA can be used to examine Arabidopsis thaliana gene expression patterns associated with metabolic pathways. Conclusion PathMAPA provides two unique features for the gene expression analysis of Arabidopsis thaliana: (i) automatic generation of pathways associated with gene expression and (ii) statistical tests at pathway level. The first feature allows for the periodical updating of genomic data for pathways, while the second feature can provide insight into how treatments affect relevant pathways for the selected experiment(s). PMID:14604444

  11. Phytoremediation of chromium using Salix species: cloning ESTs and candidate genes involved in the Cr response.

    PubMed

    Quaggiotti, Silvia; Barcaccia, Gianni; Schiavon, Michela; Nicolé, Silvia; Galla, Giulio; Rossignolo, Virginia; Soattin, Marica; Malagoli, Mario

    2007-11-01

    In this research a differential display based on the detection of cDNA-AFLP markers was used to identify candidate genes potentially involved in the regulation of the response to chromium in four different willow species (Salix alba, Salix eleagnos, Salix fragilis and Salix matsudana) chosen on the basis of their suitability in phytoremediation techniques. Our approach enabled the assay of a large set of mRNA-related fragments and increased the reliability of amplification-based transcriptome analysis. The vast majority of transcript-derived fragments were shared among samples within species and thus attributable to constitutively expressed genes. However, a number of differentially expressed mRNAs were scored in each species and a total of 68 transcripts displaying an altered expression in response to Cr were isolated and sequenced. Public database querying revealed that 44.1% and 4.4% of the cloned ESTs score significant similarity with genes encoding proteins having known or putative function, or with genes coding for unknown proteins, respectively, whereas the remaining 51.5% did not retrieve any homology. Semi-quantitative RT-PCR analysis of seven candidate genes fully confirmed the expression patterns obtained by cDNA-AFLP. Our results indicate the existence of common mechanisms of gene regulation in response to Cr, pathogen attack and senescence-mediated programmed cell death, and suggest a role for the genes isolated in the cross-talk of the signaling pathways governing the adaptation to biotic and abiotic stresses.

  12. Microarray data and gene expression statistics for Saccharomyces cerevisiae exposed to simulated asbestos mine drainage.

    PubMed

    Driscoll, Heather E; Murray, Janet M; English, Erika L; Hunter, Timothy C; Pivarski, Kara; Dolci, Elizabeth D

    2017-08-01

    Here we describe microarray expression data (raw and normalized), experimental metadata, and gene-level data with expression statistics from Saccharomyces cerevisiae exposed to simulated asbestos mine drainage from the Vermont Asbestos Group (VAG) Mine on Belvidere Mountain in northern Vermont, USA. For nearly 100 years (between the late 1890s and 1993), chrysotile asbestos fibers were extracted from serpentinized ultramafic rock at the VAG Mine for use in construction and manufacturing industries. Studies have shown that water courses and streambeds nearby have become contaminated with asbestos mine tailings runoff, including elevated levels of magnesium, nickel, chromium, and arsenic, elevated pH, and chrysotile asbestos-laden mine tailings, due to leaching and gradual erosion of massive piles of mine waste covering approximately 9 km 2 . We exposed yeast to simulated VAG Mine tailings leachate to help gain insight on how eukaryotic cells exposed to VAG Mine drainage may respond in the mine environment. Affymetrix GeneChip® Yeast Genome 2.0 Arrays were utilized to assess gene expression after 24-h exposure to simulated VAG Mine tailings runoff. The chemistry of mine-tailings leachate, mine-tailings leachate plus yeast extract peptone dextrose media, and control yeast extract peptone dextrose media is also reported. To our knowledge this is the first dataset to assess global gene expression patterns in a eukaryotic model system simulating asbestos mine tailings runoff exposure. Raw and normalized gene expression data are accessible through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) Database Series GSE89875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89875).

  13. Finding gene regulatory network candidates using the gene expression knowledge base.

    PubMed

    Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin

    2014-12-10

    Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

  14. PhosphoregDB: The tissue and sub-cellular distribution of mammalian protein kinases and phosphatases

    PubMed Central

    Forrest, Alistair RR; Taylor, Darrin F; Fink, J Lynn; Gongora, M Milena; Flegg, Cameron; Teasdale, Rohan D; Suzuki, Harukazu; Kanamori, Mutsumi; Kai, Chikatoshi; Hayashizaki, Yoshihide; Grimmond, Sean M

    2006-01-01

    Background Protein kinases and protein phosphatases are the fundamental components of phosphorylation dependent protein regulatory systems. We have created a database for the protein kinase-like and phosphatase-like loci of mouse that integrates protein sequence, interaction, classification and pathway information with the results of a systematic screen of their sub-cellular localization and tissue specific expression data mined from the GNF tissue atlas of mouse. Results The database lets users query where a specific kinase or phosphatase is expressed at both the tissue and sub-cellular levels. Similarly the interface allows the user to query by tissue, pathway or sub-cellular localization, to reveal which components are co-expressed or co-localized. A review of their expression reveals 30% of these components are detected in all tissues tested while 70% show some level of tissue restriction. Hierarchical clustering of the expression data reveals that expression of these genes can be used to separate the samples into tissues of related lineage, including 3 larger clusters of nervous tissue, developing embryo and cells of the immune system. By overlaying the expression, sub-cellular localization and classification data we examine correlations between class, specificity and tissue restriction and show that tyrosine kinases are more generally expressed in fewer tissues than serine/threonine kinases. Conclusion Together these data demonstrate that cell type specific systems exist to regulate protein phosphorylation and that for accurate modelling and for determination of enzyme substrate relationships the co-location of components needs to be considered. PMID:16504016

  15. GenoLink: a graph-based querying and browsing system for investigating the function of genes and proteins.

    PubMed

    Durand, Patrick; Labarre, Laurent; Meil, Alain; Divo, Jean-Louis; Vandenbrouck, Yves; Viari, Alain; Wojcik, Jérôme

    2006-01-17

    A large variety of biological data can be represented by graphs. These graphs can be constructed from heterogeneous data coming from genomic and post-genomic technologies, but there is still need for tools aiming at exploring and analysing such graphs. This paper describes GenoLink, a software platform for the graphical querying and exploration of graphs. GenoLink provides a generic framework for representing and querying data graphs. This framework provides a graph data structure, a graph query engine, allowing to retrieve sub-graphs from the entire data graph, and several graphical interfaces to express such queries and to further explore their results. A query consists in a graph pattern with constraints attached to the vertices and edges. A query result is the set of all sub-graphs of the entire data graph that are isomorphic to the pattern and satisfy the constraints. The graph data structure does not rely upon any particular data model but can dynamically accommodate for any user-supplied data model. However, for genomic and post-genomic applications, we provide a default data model and several parsers for the most popular data sources. GenoLink does not require any programming skill since all operations on graphs and the analysis of the results can be carried out graphically through several dedicated graphical interfaces. GenoLink is a generic and interactive tool allowing biologists to graphically explore various sources of information. GenoLink is distributed either as a standalone application or as a component of the Genostar/Iogma platform. Both distributions are free for academic research and teaching purposes and can be requested at academy@genostar.com. A commercial licence form can be obtained for profit company at info@genostar.com. See also http://www.genostar.org.

  16. GenoLink: a graph-based querying and browsing system for investigating the function of genes and proteins

    PubMed Central

    Durand, Patrick; Labarre, Laurent; Meil, Alain; Divo1, Jean-Louis; Vandenbrouck, Yves; Viari, Alain; Wojcik, Jérôme

    2006-01-01

    Background A large variety of biological data can be represented by graphs. These graphs can be constructed from heterogeneous data coming from genomic and post-genomic technologies, but there is still need for tools aiming at exploring and analysing such graphs. This paper describes GenoLink, a software platform for the graphical querying and exploration of graphs. Results GenoLink provides a generic framework for representing and querying data graphs. This framework provides a graph data structure, a graph query engine, allowing to retrieve sub-graphs from the entire data graph, and several graphical interfaces to express such queries and to further explore their results. A query consists in a graph pattern with constraints attached to the vertices and edges. A query result is the set of all sub-graphs of the entire data graph that are isomorphic to the pattern and satisfy the constraints. The graph data structure does not rely upon any particular data model but can dynamically accommodate for any user-supplied data model. However, for genomic and post-genomic applications, we provide a default data model and several parsers for the most popular data sources. GenoLink does not require any programming skill since all operations on graphs and the analysis of the results can be carried out graphically through several dedicated graphical interfaces. Conclusion GenoLink is a generic and interactive tool allowing biologists to graphically explore various sources of information. GenoLink is distributed either as a standalone application or as a component of the Genostar/Iogma platform. Both distributions are free for academic research and teaching purposes and can be requested at academy@genostar.com. A commercial licence form can be obtained for profit company at info@genostar.com. See also . PMID:16417636

  17. Construction of a rice glycoside hydrolase phylogenomic database and identification of targets for biofuel research

    PubMed Central

    Sharma, Rita; Cao, Peijian; Jung, Ki-Hong; Sharma, Manoj K.; Ronald, Pamela C.

    2013-01-01

    Glycoside hydrolases (GH) catalyze the hydrolysis of glycosidic bonds in cell wall polymers and can have major effects on cell wall architecture. Taking advantage of the massive datasets available in public databases, we have constructed a rice phylogenomic database of GHs (http://ricephylogenomics.ucdavis.edu/cellwalls/gh/). This database integrates multiple data types including the structural features, orthologous relationships, mutant availability, and gene expression patterns for each GH family in a phylogenomic context. The rice genome encodes 437 GH genes classified into 34 families. Based on pairwise comparison with eight dicot and four monocot genomes, we identified 138 GH genes that are highly diverged between monocots and dicots, 57 of which have diverged further in rice as compared with four monocot genomes scanned in this study. Chromosomal localization and expression analysis suggest a role for both whole-genome and localized gene duplications in expansion and diversification of GH families in rice. We examined the meta-profiles of expression patterns of GH genes in twenty different anatomical tissues of rice. Transcripts of 51 genes exhibit tissue or developmental stage-preferential expression, whereas, seventeen other genes preferentially accumulate in actively growing tissues. When queried in RiceNet, a probabilistic functional gene network that facilitates functional gene predictions, nine out of seventeen genes form a regulatory network with the well-characterized genes involved in biosynthesis of cell wall polymers including cellulose synthase and cellulose synthase-like genes of rice. Two-thirds of the GH genes in rice are up regulated in response to biotic and abiotic stress treatments indicating a role in stress adaptation. Our analyses identify potential GH targets for cell wall modification. PMID:23986771

  18. SpidermiR: An R/Bioconductor Package for Integrative Analysis with miRNA Data.

    PubMed

    Cava, Claudia; Colaprico, Antonio; Bertoli, Gloria; Graudenzi, Alex; Silva, Tiago C; Olsen, Catharina; Noushmehr, Houtan; Bontempi, Gianluca; Mauri, Giancarlo; Castiglioni, Isabella

    2017-01-27

    Gene Regulatory Networks (GRNs) control many biological systems, but how such network coordination is shaped is still unknown. GRNs can be subdivided into basic connections that describe how the network members interact e.g., co-expression, physical interaction, co-localization, genetic influence, pathways, and shared protein domains. The important regulatory mechanisms of these networks involve miRNAs. We developed an R/Bioconductor package, namely SpidermiR, which offers an easy access to both GRNs and miRNAs to the end user, and integrates this information with differentially expressed genes obtained from The Cancer Genome Atlas. Specifically, SpidermiR allows the users to: (i) query and download GRNs and miRNAs from validated and predicted repositories; (ii) integrate miRNAs with GRNs in order to obtain miRNA-gene-gene and miRNA-protein-protein interactions, and to analyze miRNA GRNs in order to identify miRNA-gene communities; and (iii) graphically visualize the results of the analyses. These analyses can be performed through a single interface and without the need for any downloads. The full data sets are then rapidly integrated and processed locally.

  19. Low-Dose, Long-Wave UV Light Does Not Affect Gene Expression of Human Mesenchymal Stem Cells

    PubMed Central

    Wong, Darice Y.; Ranganath, Thanmayi; Kasko, Andrea M.

    2015-01-01

    Light is a non-invasive tool that is widely used in a range of biomedical applications. Techniques such as photopolymerization, photodegradation, and photouncaging can be used to alter the chemical and physical properties of biomaterials in the presence of live cells. Long-wave UV light (315 nm–400 nm) is an easily accessible and commonly used energy source for triggering biomaterial changes. Although exposure to low doses of long-wave UV light is generally accepted as biocompatible, most studies employing this wavelength only establish cell viability, ignoring other possible (non-toxic) effects. Since light exposure of wavelengths longer than 315 nm may potentially induce changes in cell behavior, we examined changes in gene expression of human mesenchymal stem cells exposed to light under both 2D and 3D culture conditions, including two different hydrogel fabrication techniques, decoupling UV exposure and radical generation. While exposure to long-wave UV light did not induce significant changes in gene expression regardless of culture conditions, significant changes were observed due to scaffold fabrication chemistry and between cells plated in 2D versus encapsulated in 3D scaffolds. In order to facilitate others in searching for more specific changes between the many conditions, the full data set is available on Gene Expression Omnibus for querying. PMID:26418040

  20. Tomato Expression Database (TED): a suite of data presentation and analysis tools

    PubMed Central

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150 000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at . PMID:16381976

  1. Tomato Expression Database (TED): a suite of data presentation and analysis tools.

    PubMed

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150,000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at http://ted.bti.cornell.edu.

  2. VisANT 3.0: new modules for pathway visualization, editing, prediction and construction.

    PubMed

    Hu, Zhenjun; Ng, David M; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M; DeLisi, Charles

    2007-07-01

    With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.

  3. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

    PubMed

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2013-11-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

  4. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    PubMed

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    PubMed

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  6. Expression and functional studies of genes involved in transport and metabolism of glycerol in Pachysolen tannophilus

    PubMed Central

    2013-01-01

    Background Pachysolen tannophilus is a non-conventional yeast, which can metabolize many of the carbon sources found in low cost feedstocks including glycerol and xylose. The xylose utilisation pathways have been extensively studied in this organism. However, the mechanism behind glycerol metabolism is poorly understood. Using the recently published genome sequence of P. tannophilus CBS4044, we searched for genes with functions in glycerol transport and metabolism by performing a BLAST search using the sequences of the relevant genes from Saccharomyces cerevisiae as queries. Results Quantitative real-time PCR was performed to unveil the expression patterns of these genes during growth of P. tannophilus on glycerol and glucose as sole carbon sources. The genes predicted to be involved in glycerol transport in P. tannophilus were expressed in S. cerevisiae to validate their function. The S. cerevisiae strains transformed with heterologous genes showed improved growth and glycerol consumption rates with glycerol as the sole carbon source. Conclusions P. tannophilus has characteristics relevant for a microbial cell factory to be applied in a biorefinery setting, i.e. its ability to utilise the carbon sources such as xylose and glycerol. However, the strain is not currently amenable to genetic modification and transformation. Heterologous expression of the glycerol transporters from P. tannophilus, which has a relatively high growth rate on glycerol, could be used as an approach for improving the efficiency of glycerol assimilation in other well characterized and applied cell factories such as S. cerevisiae. PMID:23514356

  7. A high-throughput quantitative expression analysis of cancer-related genes in human HepG2 cells in response to limonene, a potential anticancer agent.

    PubMed

    Hafidh, Rand R; Hussein, Saba Z; MalAllah, Mohammed Q; Abdulamir, Ahmed S; Abu Bakar, Fatimah

    2017-11-14

    Citrus bioactive compounds, as active anticancer agent, have been under focus by several studies worldwide. However, the underlying genes responsible for the anticancer potential have not been sufficiently highlighted. The current study investigated the gene expression profile of hepatocellular carcinoma, HepG2, cells after treatment with Limonene. The concentration that killed 50% of HepG2 cells was used to elucidate the genetic mechanisms of limonene anticancer activity. The apoptotic induction was detected by flow cytometry and confocal fluorescence microscope. Two of pro-apoptotic events, caspase-3 activation and phosphatidylserine translocation were manifested by confocal fluorescence microscopy. High-throughput real-time PCR was used to profile 1023 cancer-related genes in 16 different gene families related to the cancer development. In comparison to untreated cells, limonene increased the percentage of apoptotic cells up to 89.61%, by flow cytometry, and 48.2% by fluorescence microscopy. There was a significant limonene-driven differential gene expression of HepG2 cells in 15 different gene families. Limonene was shown to significantly (>2log) up-regulate and down-regulate 14 and 59 genes, respectively. The affected gene families, from most to least affected, were apoptosis induction, signal transduction, cancer genes augmentation, alteration in kinases expression, inflammation, DNA damage repair, and cell cycle proteins. The current study reveals that limonene could be a promising, cheap, and effective anticancer compound. The broad spectrum of limonene anticancer activity is interesting for anticancer drug development. Further research is needed to confirm the current findings and to examine the anticancer potential of limonene along with underlying mechanisms on different cell lines. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  8. Moving Toward Integrating Gene Expression Profiling Into High-Throughput Testing: A Gene Expression Biomarker Accurately Predicts Estrogen Receptor α Modulation in a Microarray Compendium

    PubMed Central

    Ryan, Natalia; Chorley, Brian; Tice, Raymond R.; Judson, Richard; Corton, J. Christopher

    2016-01-01

    Microarray profiling of chemical-induced effects is being increasingly used in medium- and high-throughput formats. Computational methods are described here to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), often modulated by potential endocrine disrupting chemicals. ERα biomarker genes were identified by their consistent expression after exposure to 7 structurally diverse ERα agonists and 3 ERα antagonists in ERα-positive MCF-7 cells. Most of the biomarker genes were shown to be directly regulated by ERα as determined by ESR1 gene knockdown using siRNA as well as through chromatin immunoprecipitation coupled with DNA sequencing analysis of ERα-DNA interactions. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression datasets from experiments using MCF-7 cells, including those evaluating the transcriptional effects of hormones and chemicals. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% and 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) ER reference chemicals including “very weak” agonists. Importantly, the biomarker predictions accurately replicated predictions based on 18 in vitro high-throughput screening assays that queried different steps in ERα signaling. For 114 chemicals, the balanced accuracies were 95% and 98% for activation or suppression, respectively. These results demonstrate that the ERα gene expression biomarker can accurately identify ERα modulators in large collections of microarray data derived from MCF-7 cells. PMID:26865669

  9. Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways.

    PubMed

    Obayashi, Takeshi; Kinoshita, Kengo

    2010-05-01

    Gene coexpression analyses are a powerful method to predict the function of genes and/or to identify genes that are functionally related to query genes. The basic idea of gene coexpression analyses is that genes with similar functions should have similar expression patterns under many different conditions. This approach is now widely used by many experimental researchers, especially in the field of plant biology. In this review, we will summarize recent successful examples obtained by using our gene coexpression database, ATTED-II. Specifically, the examples will describe the identification of new genes, such as the subunits of a complex protein, the enzymes in a metabolic pathway and transporters. In addition, we will discuss the discovery of a new intercellular signaling factor and new regulatory relationships between transcription factors and their target genes. In ATTED-II, we provide two basic views of gene coexpression, a gene list view and a gene network view, which can be used as guide gene approach and narrow-down approach, respectively. In addition, we will discuss the coexpression effectiveness for various types of gene sets.

  10. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

    PubMed Central

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2016-01-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325

  11. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

    PubMed Central

    2012-01-01

    Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org. PMID:22958570

  12. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

    PubMed

    Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey

    2003-11-01

    Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

  13. C-State: an interactive web app for simultaneous multi-gene visualization and comparative epigenetic pattern search.

    PubMed

    Sowpati, Divya Tej; Srivastava, Surabhi; Dhawan, Jyotsna; Mishra, Rakesh K

    2017-09-13

    Comparative epigenomic analysis across multiple genes presents a bottleneck for bench biologists working with NGS data. Despite the development of standardized peak analysis algorithms, the identification of novel epigenetic patterns and their visualization across gene subsets remains a challenge. We developed a fast and interactive web app, C-State (Chromatin-State), to query and plot chromatin landscapes across multiple loci and cell types. C-State has an interactive, JavaScript-based graphical user interface and runs locally in modern web browsers that are pre-installed on all computers, thus eliminating the need for cumbersome data transfer, pre-processing and prior programming knowledge. C-State is unique in its ability to extract and analyze multi-gene epigenetic information. It allows for powerful GUI-based pattern searching and visualization. We include a case study to demonstrate its potential for identifying user-defined epigenetic trends in context of gene expression profiles.

  14. PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology

    PubMed Central

    Yue, Zongliang; Zheng, Qi; Neylon, Michael T; Yoo, Minjae; Shin, Jimin; Zhao, Zhiying; Tan, Aik Choon

    2018-01-01

    Abstract Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug–gene, miRNA–gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/. PMID:29126216

  15. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

  16. Transcriptome-Wide Changes in Chlamydomonas reinhardtii Gene Expression Regulated by Carbon Dioxide and the CO2-Concentrating Mechanism Regulator CIA5/CCM1[W][OA

    PubMed Central

    Fang, Wei; Si, Yaqing; Douglass, Stephen; Casero, David; Merchant, Sabeeha S.; Pellegrini, Matteo; Ladunga, Istvan; Liu, Peng; Spalding, Martin H.

    2012-01-01

    We used RNA sequencing to query the Chlamydomonas reinhardtii transcriptome for regulation by CO2 and by the transcription regulator CIA5 (CCM1). Both CO2 and CIA5 are known to play roles in acclimation to low CO2 and in induction of an essential CO2-concentrating mechanism (CCM), but less is known about their interaction and impact on the whole transcriptome. Our comparison of the transcriptome of a wild type versus a cia5 mutant strain under three different CO2 conditions, high CO2 (5%), low CO2 (0.03 to 0.05%), and very low CO2 (<0.02%), provided an entry into global changes in the gene expression patterns occurring in response to the interaction between CO2 and CIA5. We observed a massive impact of CIA5 and CO2 on the transcriptome, affecting almost 25% of all Chlamydomonas genes, and we discovered an array of gene clusters with distinctive expression patterns that provide insight into the regulatory interaction between CIA5 and CO2. Several individual clusters respond primarily to either CIA5 or CO2, providing access to genes regulated by one factor but decoupled from the other. Three distinct clusters clearly associated with CCM-related genes may represent a rich source of candidates for new CCM components, including a small cluster of genes encoding putative inorganic carbon transporters. PMID:22634760

  17. Mouse Phenome Database

    PubMed Central

    Grubb, Stephen C.; Bult, Carol J.; Bogue, Molly A.

    2014-01-01

    The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements. PMID:24243846

  18. Developing a kidney and urinary pathway knowledge base

    PubMed Central

    2011-01-01

    Background Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. Results We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. Conclusions The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain’s ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. Availability The KUPKB may be accessed via http://www.e-lico.eu/kupkb. PMID:21624162

  19. An ontology-based search engine for protein-protein interactions

    PubMed Central

    2010-01-01

    Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195

  20. An ontology-based search engine for protein-protein interactions.

    PubMed

    Park, Byungkyu; Han, Kyungsook

    2010-01-18

    Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.

  1. D-Light on promoters: a client-server system for the analysis and visualization of cis-regulatory elements

    PubMed Central

    2013-01-01

    Background The binding of transcription factors to DNA plays an essential role in the regulation of gene expression. Numerous experiments elucidated binding sequences which subsequently have been used to derive statistical models for predicting potential transcription factor binding sites (TFBS). The rapidly increasing number of genome sequence data requires sophisticated computational approaches to manage and query experimental and predicted TFBS data in the context of other epigenetic factors and across different organisms. Results We have developed D-Light, a novel client-server software package to store and query large amounts of TFBS data for any number of genomes. Users can add small-scale data to the server database and query them in a large scale, genome-wide promoter context. The client is implemented in Java and provides simple graphical user interfaces and data visualization. Here we also performed a statistical analysis showing what a user can expect for certain parameter settings and we illustrate the usage of D-Light with the help of a microarray data set. Conclusions D-Light is an easy to use software tool to integrate, store and query annotation data for promoters. A public D-Light server, the client and server software for local installation and the source code under GNU GPL license are available at http://biwww.che.sbg.ac.at/dlight. PMID:23617301

  2. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying

    PubMed Central

    Kiefer, Richard C.; Freimuth, Robert R.; Chute, Christopher G; Pathak, Jyotishman

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively. PMID:24303249

  3. Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

    PubMed Central

    Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

    2015-01-01

    We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572

  4. An Expressive and Efficient Language for XML Information Retrieval.

    ERIC Educational Resources Information Center

    Chinenyanga, Taurai Tapiwa; Kushmerick, Nicholas

    2002-01-01

    Discusses XML and information retrieval and describes a query language, ELIXIR (expressive and efficient language for XML information retrieval), with a textual similarity operator that can be used for similarity joins. Explains the algorithm for answering ELIXIR queries to generate intermediate relational data. (Author/LRW)

  5. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  6. Inferring Alcoholism SNPs and Regulatory Chemical Compounds Based on Ensemble Bayesian Network.

    PubMed

    Chen, Huan; Sun, Jiatong; Jiang, Hong; Wang, Xianyue; Wu, Lingxiang; Wu, Wei; Wang, Qh

    2017-01-01

    The disturbance of consciousness is one of the most common symptoms of those have alcoholism and may cause disability and mortality. Previous studies indicated that several single nucleotide polymorphisms (SNP) increase the susceptibility of alcoholism. In this study, we utilized the Ensemble Bayesian Network (EBN) method to identify causal SNPs of alcoholism based on the verified GAW14 data. We built a Bayesian network combining random process and greedy search by using Genetic Analysis Workshop 14 (GAW14) dataset to establish EBN of SNPs. Then we predicted the association between SNPs and alcoholism by determining Bayes' prior probability. Thirteen out of eighteen SNPs directly connected with alcoholism were found concordance with potential risk regions of alcoholism in OMIM database. As many SNPs were found contributing to alteration on gene expression, known as expression quantitative trait loci (eQTLs), we further sought to identify chemical compounds acting as regulators of alcoholism genes captured by causal SNPs. Chloroprene and valproic acid were identified as the expression regulators for genes C11orf66 and SALL3 which were captured by alcoholism SNPs, respectively. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. Quantitative comparison of microarray experiments with published leukemia related gene expression signatures.

    PubMed

    Klein, Hans-Ulrich; Ruckert, Christian; Kohlmann, Alexander; Bullinger, Lars; Thiede, Christian; Haferlach, Torsten; Dugas, Martin

    2009-12-15

    Multiple gene expression signatures derived from microarray experiments have been published in the field of leukemia research. A comparison of these signatures with results from new experiments is useful for verification as well as for interpretation of the results obtained. Currently, the percentage of overlapping genes is frequently used to compare published gene signatures against a signature derived from a new experiment. However, it has been shown that the percentage of overlapping genes is of limited use for comparing two experiments due to the variability of gene signatures caused by different array platforms or assay-specific influencing parameters. Here, we present a robust approach for a systematic and quantitative comparison of published gene expression signatures with an exemplary query dataset. A database storing 138 leukemia-related published gene signatures was designed. Each gene signature was manually annotated with terms according to a leukemia-specific taxonomy. Two analysis steps are implemented to compare a new microarray dataset with the results from previous experiments stored and curated in the database. First, the global test method is applied to assess gene signatures and to constitute a ranking among them. In a subsequent analysis step, the focus is shifted from single gene signatures to chromosomal aberrations or molecular mutations as modeled in the taxonomy. Potentially interesting disease characteristics are detected based on the ranking of gene signatures associated with these aberrations stored in the database. Two example analyses are presented. An implementation of the approach is freely available as web-based application. The presented approach helps researchers to systematically integrate the knowledge derived from numerous microarray experiments into the analysis of a new dataset. By means of example leukemia datasets we demonstrate that this approach detects related experiments as well as related molecular mutations and may help to interpret new microarray data.

  8. Querying Proofs

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2012-01-01

    We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.

  9. Transcriptional analysis of immune genes in Epstein-Barr virus-associated gastric cancer and association with clinical outcomes.

    PubMed

    Sundar, Raghav; Qamra, Aditi; Tan, Angie Lay Keng; Zhang, Shenli; Ng, Cedric Chuan Young; Teh, Bin Tean; Lee, Jeeyun; Kim, Kyoung-Mee; Tan, Patrick

    2018-06-18

    Epstein-Barr virus-associated gastric cancer (EBVaGC) has traditionally been associated with high expression of PD-L1 and immune infiltration. Correlations between PD-L1 and other immune-related gene (IRG) expressions in EBVaGC have not been previously described. We performed NanoString ® transcriptomic profiling and PD-L1 immunohistochemistry (IHC) (using the FDA approved Dako PD-L1 IHC 22C3) on EBVaGC samples from gastric cancer patients undergoing primary tumor resections at Samsung Medical Centre, South Korea. For controls, EBV-negative samples from the previously reported Asian Cancer Research Group (EBVnegACRG) cohort were used. Genes tested included PD-L1 and other IRGs related to intra-tumoral cytolytic activity, cytokines and immune checkpoints. Samples with PD-L1 expression > 34th percentile were defined as PD-L1 high and the remaining as PD-L1 low . We identified 71 cases of EBVaGC and 193 EBV-negative ACRG samples as controls. EBVaGC showed higher expression of all queried immune genes compared to EBVnegACRG samples (p < 0.01). PD-L1 immunohistochemistry expression correlated with PD-L1 transcript expression (r = 0.63, p < 0.001). Tumor-infiltrating lymphocyte patterns were also found to be different between PD-L1 low and PD-L1 high groups. PD-L1 low EBVaGC samples (n = 24, 34%) had consistently decreased expression of all other immune genes, such as CD8A, GZMA and PRF1 and PD-1 (p < 0.001). PD-L1 low EBVaGC samples were also associated with worse disease-free survival (HR 5.03, p = 0.032) compared to PD-L1 high EBVaGC samples. A substantial proportion of EBVaGC does not express high levels of PD-L1 and other immune genes. EBVaGCs which have lower transcriptomic expression of PD-L1 tend to have a similarly low expression of other immune genes, IHC scores and a poorer prognosis.

  10. The Profile-Query Relationship.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.; Phillips, W. J.

    1986-01-01

    Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…

  11. Ovary transcriptome profiling via artificial intelligence reveals a transcriptomic fingerprint predicting egg quality in striped bass, Morone saxatilis.

    PubMed

    Chapman, Robert W; Reading, Benjamin J; Sullivan, Craig V

    2014-01-01

    Inherited gene transcripts deposited in oocytes direct early embryonic development in all vertebrates, but transcript profiles indicative of embryo developmental competence have not previously been identified. We employed artificial intelligence to model profiles of maternal ovary gene expression and their relationship to egg quality, evaluated as production of viable mid-blastula stage embryos, in the striped bass (Morone saxatilis), a farmed species with serious egg quality problems. In models developed using artificial neural networks (ANNs) and supervised machine learning, collective changes in the expression of a limited suite of genes (233) representing <2% of the queried ovary transcriptome explained >90% of the eventual variance in embryo survival. Egg quality related to minor changes in gene expression (<0.2-fold), with most individual transcripts making a small contribution (<1%) to the overall prediction of egg quality. These findings indicate that the predictive power of the transcriptome as regards egg quality resides not in levels of individual genes, but rather in the collective, coordinated expression of a suite of transcripts constituting a transcriptomic "fingerprint". Correlation analyses of the corresponding candidate genes indicated that dysfunction of the ubiquitin-26S proteasome, COP9 signalosome, and subsequent control of the cell cycle engenders embryonic developmental incompetence. The affected gene networks are centrally involved in regulation of early development in all vertebrates, including humans. By assessing collective levels of the relevant ovarian transcripts via ANNs we were able, for the first time in any vertebrate, to accurately predict the subsequent embryo developmental potential of eggs from individual females. Our results show that the transcriptomic fingerprint evidencing developmental dysfunction is highly predictive of, and therefore likely to regulate, egg quality, a biologically complex trait crucial to reproductive fitness.

  12. Ovary Transcriptome Profiling via Artificial Intelligence Reveals a Transcriptomic Fingerprint Predicting Egg Quality in Striped Bass, Morone saxatilis

    PubMed Central

    2014-01-01

    Inherited gene transcripts deposited in oocytes direct early embryonic development in all vertebrates, but transcript profiles indicative of embryo developmental competence have not previously been identified. We employed artificial intelligence to model profiles of maternal ovary gene expression and their relationship to egg quality, evaluated as production of viable mid-blastula stage embryos, in the striped bass (Morone saxatilis), a farmed species with serious egg quality problems. In models developed using artificial neural networks (ANNs) and supervised machine learning, collective changes in the expression of a limited suite of genes (233) representing <2% of the queried ovary transcriptome explained >90% of the eventual variance in embryo survival. Egg quality related to minor changes in gene expression (<0.2-fold), with most individual transcripts making a small contribution (<1%) to the overall prediction of egg quality. These findings indicate that the predictive power of the transcriptome as regards egg quality resides not in levels of individual genes, but rather in the collective, coordinated expression of a suite of transcripts constituting a transcriptomic “fingerprint”. Correlation analyses of the corresponding candidate genes indicated that dysfunction of the ubiquitin-26S proteasome, COP9 signalosome, and subsequent control of the cell cycle engenders embryonic developmental incompetence. The affected gene networks are centrally involved in regulation of early development in all vertebrates, including humans. By assessing collective levels of the relevant ovarian transcripts via ANNs we were able, for the first time in any vertebrate, to accurately predict the subsequent embryo developmental potential of eggs from individual females. Our results show that the transcriptomic fingerprint evidencing developmental dysfunction is highly predictive of, and therefore likely to regulate, egg quality, a biologically complex trait crucial to reproductive fitness. PMID:24820964

  13. p53 as the focus of gene therapy: past, present and future.

    PubMed

    Valente, Joana Fa; Queiroz, Joao A; Sousa, Fani

    2018-01-15

    Several gene deviations can be responsible for triggering oncogenic processes. However, mutations in tumour suppressor genes are usually more associated to malignant diseases, being p53 one of the most affected and studied element. p53 is implicated in a number of known cellular functions, including DNA damage repair, cell cycle arrest in G1/S and G2/M and apoptosis, being an interesting target for cancer treatment. Considering these facts, the development of gene therapy approaches focused on p53 expression and regulation seems to be a promising strategy for cancer therapy. Several studies have shown that transfection of cancer cells with wild-type p53 expressing plasmids could directly drive cells into apoptosis and/or growth arrest, suggesting that a gene therapy approach for cancer treatment can be based on the re-establishment of the normal p53 expression levels and function. Up until now, several clinical research studies using viral and non-viral vectors delivering p53 genes, isolated or combined with other therapeutic agents, have been accomplished and there are already in the market therapies based on the use of this gene. This review summarizes the different methods used to deliver and/or target the p53 as well as the main results of therapeutic effect obtained with the different strategies applied. Finally, the ongoing approaches are described, also focusing the combinatorial therapeutics to show the increased therapeutic potential of combining gene therapy vectors with chemo or radiotherapy. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Dysregulated Pathway Identification of Alzheimer's Disease Based on Internal Correlation Analysis of Genes and Pathways.

    PubMed

    Kong, Wei; Mou, Xiaoyang; Di, Benteng; Deng, Jin; Zhong, Ruxing; Wang, Shuaiqun

    2017-11-20

    Dysregulated pathway identification is an important task which can gain insight into the underlying biological processes of disease. Current pathway-identification methods focus on a set of co-expression genes and single pathways and ignore the correlation between genes and pathways. The method proposed in this study, takes into account the internal correlations not only between genes but also pathways to identifying dysregulated pathways related to Alzheimer's disease (AD), the most common form of dementia. In order to find the significantly differential genes for AD, mutual information (MI) is used to measure interdependencies between genes other than expression valves. Then, by integrating the topology information from KEGG, the significant pathways involved in the feature genes are identified. Next, the distance correlation (DC) is applied to measure the pairwise pathway crosstalks since DC has the advantage of detecting nonlinear correlations when compared to Pearson correlation. Finally, the pathway pairs with significantly different correlations between normal and AD samples are known as dysregulated pathways. The molecular biology analysis demonstrated that many dysregulated pathways related to AD pathogenesis have been discovered successfully by the internal correlation detection. Furthermore, the insights of the dysregulated pathways in the development and deterioration of AD will help to find new effective target genes and provide important theoretical guidance for drug design. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Androgen-induced programs for prostate epithelial growth and invasion arise in embryogenesis and are reactivated in cancer

    PubMed Central

    Schaeffer, EM; Marchionni, L; Huang, Z; Simons, B; Blackman, A; Yu, W; Parmigiani, G; Berman, DM

    2008-01-01

    Cancer cells differentiate along specific lineages that largely determine their clinical and biologic behavior. Distinct cancer phenotypes from different cells and organs likely result from unique gene expression repertoires established in the embryo and maintained after malignant transformation. We used comprehensive gene expression analysis to examine this concept in the prostate, an organ with a tractable developmental program and a high propensity for cancer. We focused on gene expression in the murine prostate rudiment at three time points during the first 48 h of exposure to androgen, which initiates proliferation and invasion of prostate epithelial buds into surrounding urogenital sinus mesenchyme. Here, we show that androgen exposure regulates genes previously implicated in prostate carcinogenesis comprising pathways for the phosphatase and tensin homolog (PTEN), fibroblast growth factor (FGF)/mitogen-activated protein kinase (MAPK), and Wnt signaling along with cellular programs regulating such ‘hallmarks’ of cancer as angiogenesis, apoptosis, migration and proliferation. We found statistically significant evidence for novel androgeninduced gene regulation events that establish and/or maintain prostate cell fate. These include modulation of gene expression through microRNAs, expression of specific transcription factors, and regulation of their predicted targets. By querying public gene expression databases from other tissues, we found that rather than generally characterizing androgen exposure or epithelial budding, the early prostate development program more closely resembles the program for human prostate cancer. Most importantly, early androgen-regulated genes and functional themes associated with prostate development were highly enriched in contrasts between increasingly lethal forms of prostate cancer, confirming a ‘reactivation’ of embryonic pathways for proliferation and invasion in prostate cancer progression. Among the genes with the most significant links to the development and cancer, we highlight coordinate induction of the transcription factor Sox9 and suppression of the proapoptotic phospholipid-binding protein Annexin A1 that link early prostate development to early prostate carcinogenesis. These results credential early prostate development as a reliable and valid model system for the investigation of genes and pathways that drive prostate cancer. PMID:18794802

  16. Phage-Mediated Gene Therapy.

    PubMed

    Hosseinidoust, Zeinab

    2017-01-01

    Bacteriophages (bacterial viruses) have long been under investigation as vectors for gene therapy. Similar to other viral vectors, the phage coat proteins have evolved over millions of years to protect the viral genome from degradation post injection, offering protection for the valuable therapeutic sequence. However, what sets phage apart from other viral gene delivery vectors is their safety for human use and the relative ease by which foreign molecules can be expressed on the phage outer surface, enabling highly targeted gene delivery. The latter property also makes phage a popular choice for gene therapy target discovery through directed evolution. Although promising, phage-mediated gene therapy faces several outstanding challenges, the most notable being lower gene delivery efficiency compared to animal viruses, vector stability, and nondesirable immune stimulation. This review presents a critical review of promises and challenges of employing phage as gene delivery vehicles as well as an introduction to the concept of phage-based microbiome therapy as the new frontier and perhaps the most promising application of phage-based gene therapy. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  18. MOPED 2.5—An Integrated Multi-Omics Resource: Multi-Omics Profiling Expression Database Now Includes Transcriptomics Data

    PubMed Central

    Montague, Elizabeth; Stanberry, Larissa; Higdon, Roger; Janko, Imre; Lee, Elaine; Anderson, Nathaniel; Choiniere, John; Stewart, Elizabeth; Yandl, Gregory; Broomall, William; Kolker, Natali

    2014-01-01

    Abstract Multi-omics data-driven scientific discovery crucially rests on high-throughput technologies and data sharing. Currently, data are scattered across single omics repositories, stored in varying raw and processed formats, and are often accompanied by limited or no metadata. The Multi-Omics Profiling Expression Database (MOPED, http://moped.proteinspire.org) version 2.5 is a freely accessible multi-omics expression database. Continual improvement and expansion of MOPED is driven by feedback from the Life Sciences Community. In order to meet the emergent need for an integrated multi-omics data resource, MOPED 2.5 now includes gene relative expression data in addition to protein absolute and relative expression data from over 250 large-scale experiments. To facilitate accurate integration of experiments and increase reproducibility, MOPED provides extensive metadata through the Data-Enabled Life Sciences Alliance (DELSA Global, http://delsaglobal.org) metadata checklist. MOPED 2.5 has greatly increased the number of proteomics absolute and relative expression records to over 500,000, in addition to adding more than four million transcriptomics relative expression records. MOPED has an intuitive user interface with tabs for querying different types of omics expression data and new tools for data visualization. Summary information including expression data, pathway mappings, and direct connection between proteins and genes can be viewed on Protein and Gene Details pages. These connections in MOPED provide a context for multi-omics expression data exploration. Researchers are encouraged to submit omics data which will be consistently processed into expression summaries. MOPED as a multi-omics data resource is a pivotal public database, interdisciplinary knowledge resource, and platform for multi-omics understanding. PMID:24910945

  19. Multi-tissue transcriptomics for construction of a comprehensive gene resource for the terrestrial snail Theba pisana.

    PubMed

    Zhao, M; Wang, T; Adamson, K J; Storey, K B; Cummins, S F

    2016-02-08

    The land snail Theba pisana is native to the Mediterranean region but has become one of the most abundant invasive species worldwide. Here, we present three transcriptomes of this agriculture pest derived from three tissues: the central nervous system, hepatopancreas (digestive gland), and foot muscle. Sequencing of the three tissues produced 339,479,092 high quality reads and a global de novo assembly generated a total of 250,848 unique transcripts (unigenes). BLAST analysis mapped 52,590 unigenes to NCBI non-redundant protein databases and further functional analysis annotated 21,849 unigenes with gene ontology. We report that T. pisana transcripts have representatives in all functional classes and a comparison of differentially expressed transcripts amongst all three tissues demonstrates enormous differences in their potential metabolic activities. The genes differentially expressed include those with sequence similarity to those genes associated with multiple bacterial diseases and neurological diseases. To provide a valuable resource that will assist functional genomics study, we have implemented a user-friendly web interface, ThebaDB (http://thebadb.bioinfo-minzhao.org/). This online database allows for complex text queries, sequence searches, and data browsing by enriched functional terms and KEGG mapping.

  20. Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system

    PubMed Central

    Sunkin, Susan M.; Ng, Lydia; Lau, Chris; Dolbeare, Tim; Gilbert, Terri L.; Thompson, Carol L.; Hawrylycz, Michael; Dang, Chinh

    2013-01-01

    The Allen Brain Atlas (http://www.brain-map.org) provides a unique online public resource integrating extensive gene expression data, connectivity data and neuroanatomical information with powerful search and viewing tools for the adult and developing brain in mouse, human and non-human primate. Here, we review the resources available at the Allen Brain Atlas, describing each product and data type [such as in situ hybridization (ISH) and supporting histology, microarray, RNA sequencing, reference atlases, projection mapping and magnetic resonance imaging]. In addition, standardized and unique features in the web applications are described that enable users to search and mine the various data sets. Features include both simple and sophisticated methods for gene searches, colorimetric and fluorescent ISH image viewers, graphical displays of ISH, microarray and RNA sequencing data, Brain Explorer software for 3D navigation of anatomy and gene expression, and an interactive reference atlas viewer. In addition, cross data set searches enable users to query multiple Allen Brain Atlas data sets simultaneously. All of the Allen Brain Atlas resources can be accessed through the Allen Brain Atlas data portal. PMID:23193282

  1. Efficient processing of multiple nested event pattern queries over multi-dimensional event streams based on a triaxial hierarchical model.

    PubMed

    Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong

    2016-09-01

    For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.

    PubMed

    Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida

    2014-09-15

    Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

  3. MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants.

    PubMed

    Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren

    2018-01-01

    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.

  4. Computing health quality measures using Informatics for Integrating Biology and the Bedside.

    PubMed

    Klann, Jeffrey G; Murphy, Shawn N

    2013-04-19

    The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.

  5. Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside

    PubMed Central

    Murphy, Shawn N

    2013-01-01

    Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227

  6. Moving Toward Integrating Gene Expression Profiling Into High-Throughput Testing: A Gene Expression Biomarker Accurately Predicts Estrogen Receptor α Modulation in a Microarray Compendium.

    PubMed

    Ryan, Natalia; Chorley, Brian; Tice, Raymond R; Judson, Richard; Corton, J Christopher

    2016-05-01

    Microarray profiling of chemical-induced effects is being increasingly used in medium- and high-throughput formats. Computational methods are described here to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), often modulated by potential endocrine disrupting chemicals. ERα biomarker genes were identified by their consistent expression after exposure to 7 structurally diverse ERα agonists and 3 ERα antagonists in ERα-positive MCF-7 cells. Most of the biomarker genes were shown to be directly regulated by ERα as determined by ESR1 gene knockdown using siRNA as well as through chromatin immunoprecipitation coupled with DNA sequencing analysis of ERα-DNA interactions. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression datasets from experiments using MCF-7 cells, including those evaluating the transcriptional effects of hormones and chemicals. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% and 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) ER reference chemicals including "very weak" agonists. Importantly, the biomarker predictions accurately replicated predictions based on 18 in vitro high-throughput screening assays that queried different steps in ERα signaling. For 114 chemicals, the balanced accuracies were 95% and 98% for activation or suppression, respectively. These results demonstrate that the ERα gene expression biomarker can accurately identify ERα modulators in large collections of microarray data derived from MCF-7 cells. Published by Oxford University Press on behalf of the Society of Toxicology 2016. This work is written by US Government employees and is in the public domain in the US.

  7. In silico analysis of the potential mechanism of telocinobufagin on breast cancer MCF-7 cells.

    PubMed

    Dang, Yi-Wu; Lin, Peng; Liu, Li-Min; He, Rong-Quan; Zhang, Li-Jie; Peng, Zhi-Gang; Li, Xiao-Jiao; Chen, Gang

    2018-05-01

    The extractives from a ChanSu, traditional Chinese medicine, have been discovered to possess anti-inflammatory and tumor-suppressing abilities. However, the molecular mechanism of telocinobufagin, a compound extracted from ChanSu, on breast cancer cells has not been clarified. The aim of this study is to investigate the underlying mechanism of telocinobufagin on breast cancer cells. The differentially expressed genes after telocinobufagin treatment on breast cancer cells were searched and downloaded from Gene Expression Omnibus (GEO), ArrayExpress and literatures. Bioinformatics tools were applied to further explore the potential mechanism of telocinobufagin in breast cancer using the Kyoto Encyclopedia of genes and genomes (KEGG) pathway, Gene ontology (GO) enrichment, panther, and protein-protein interaction analyses. To better comprehend the role of telocinobufagin in breast cancer, we also queried the Connectivity Map using the gene expression profiles of telocinobufagin treatment. One GEO accession (GSE85871) provided 1251 differentially expressed genes after telocinobufagin treatment on MCF-7 cells. The pathway of neuroactive ligand-receptor interaction, cell adhesion molecules (CAMs), intestinal immune network for IgA production, hematopoietic cell lineage and calcium signaling pathway were the key pathways from KEGG analysis. IGF1 and KSR1, owning to higher protein levels in breast cancer tissues, IGF1 and KSR1 could be the hub genes related to telocinobufagin treatment. It was indicated that the molecular mechanism of telocinobufagin resembled that of fenspiride. Telocinobufagin might regulate neuroactive ligand-receptor interaction pathway to exert its influences in breast cancer MCF-7 cells, and its molecular mechanism might share some similarities with fenspiride. This study only presented a comprehensive picture of the role of telocinobufagin in breast cancer MCF-7 cells using big data. However, more thorough and deeper researches are required to add to the validity of this study. Copyright © 2018 Elsevier GmbH. All rights reserved.

  8. Gene Expression in Bone

    NASA Astrophysics Data System (ADS)

    D'Ambrogio, A.

    Skeletal system has two main functions, to provide mechanical integrity for both locomotion and protection and to play an important role in mineral homeostasis. There is extensive evidence showing loss of bone mass during long-term Space-Flights. The loss is due to a break in the equilibrium between the activity of osteoblasts (the cells that forms bone) and the activity of osteoclasts (the cells that resorbs bone). Surprisingly, there is scanty information about the possible altered gene expression occurring in cells that form bone in microgravity.(Just 69 articles result from a "gene expression in microgravity" MedLine query.) Gene-chip or microarray technology allows to screen thousands of genes at the same time: the use of this technology on samples coming from cells exposed to microgravity could provide us with many important informations. For example, the identification of the molecules or structures which are the first sensors of the mechanical stress derived from lack of gravity, could help in understanding which is the first event leading to bone loss due to long-term exposure to microgravity. Consequently, this structure could become a target for a custom-designed drug. It is evident that bone mass loss, observed during long-time stay in Space, represents an accelerated model of what happens in aging osteoporosis. Therefore, the discovery and design of drugs able to interfere with the bone-loss process, could help also in preventing negative physiological processes normally observed on Earth. Considering the aims stated above, my research is designed to:

  9. Gene expression-based chemical genomics identifies potential therapeutic drugs in hepatocellular carcinoma.

    PubMed

    Chen, Ming-Huang; Yang, Wu-Lung R; Lin, Kuan-Ting; Liu, Chia-Hung; Liu, Yu-Wen; Huang, Kai-Wen; Chang, Peter Mu-Hsin; Lai, Jin-Mei; Hsu, Chun-Nan; Chao, Kun-Mao; Kao, Cheng-Yan; Huang, Chi-Ying F

    2011-01-01

    Hepatocellular carcinoma (HCC) is an aggressive tumor with a poor prognosis. Currently, only sorafenib is approved by the FDA for advanced HCC treatment; therefore, there is an urgent need to discover candidate therapeutic drugs for HCC. We hypothesized that if a drug signature could reverse, at least in part, the gene expression signature of HCC, it might have the potential to inhibit HCC-related pathways and thereby treat HCC. To test this hypothesis, we first built an integrative platform, the "Encyclopedia of Hepatocellular Carcinoma genes Online 2", dubbed EHCO2, to systematically collect, organize and compare the publicly available data from HCC studies. The resulting collection includes a total of 4,020 genes. To systematically query the Connectivity Map (CMap), which includes 6,100 drug-mediated expression profiles, we further designed various gene signature selection and enrichment methods, including a randomization technique, majority vote, and clique analysis. Subsequently, 28 out of 50 prioritized drugs, including tanespimycin, trichostatin A, thioguanosine, and several anti-psychotic drugs with anti-tumor activities, were validated via MTT cell viability assays and clonogenic assays in HCC cell lines. To accelerate their future clinical use, possibly through drug-repurposing, we selected two well-established drugs to test in mice, chlorpromazine and trifluoperazine. Both drugs inhibited orthotopic liver tumor growth. In conclusion, we successfully discovered and validated existing drugs for potential HCC therapeutic use with the pipeline of Connectivity Map analysis and lab verification, thereby suggesting the usefulness of this procedure to accelerate drug repurposing for HCC treatment.

  10. Mapping the polysaccharide degradation potential of Aspergillus niger

    PubMed Central

    2012-01-01

    Background The degradation of plant materials by enzymes is an industry of increasing importance. For sustainable production of second generation biofuels and other products of industrial biotechnology, efficient degradation of non-edible plant polysaccharides such as hemicellulose is required. For each type of hemicellulose, a complex mixture of enzymes is required for complete conversion to fermentable monosaccharides. In plant-biomass degrading fungi, these enzymes are regulated and released by complex regulatory structures. In this study, we present a methodology for evaluating the potential of a given fungus for polysaccharide degradation. Results Through the compilation of information from 203 articles, we have systematized knowledge on the structure and degradation of 16 major types of plant polysaccharides to form a graphical overview. As a case example, we have combined this with a list of 188 genes coding for carbohydrate-active enzymes from Aspergillus niger, thus forming an analysis framework, which can be queried. Combination of this information network with gene expression analysis on mono- and polysaccharide substrates has allowed elucidation of concerted gene expression from this organism. One such example is the identification of a full set of extracellular polysaccharide-acting genes for the degradation of oat spelt xylan. Conclusions The mapping of plant polysaccharide structures along with the corresponding enzymatic activities is a powerful framework for expression analysis of carbohydrate-active enzymes. Applying this network-based approach, we provide the first genome-scale characterization of all genes coding for carbohydrate-active enzymes identified in A. niger. PMID:22799883

  11. Discovering relationships between nuclear receptor signaling pathways, genes, and tissues in Transcriptomine.

    PubMed

    Becnel, Lauren B; Ochsner, Scott A; Darlington, Yolanda F; McOwiti, Apollo; Kankanamge, Wasula H; Dehart, Michael; Naumov, Alexey; McKenna, Neil J

    2017-04-25

    We previously developed a web tool, Transcriptomine, to explore expression profiling data sets involving small-molecule or genetic manipulations of nuclear receptor signaling pathways. We describe advances in biocuration, query interface design, and data visualization that enhance the discovery of uncharacterized biology in these pathways using this tool. Transcriptomine currently contains about 45 million data points encompassing more than 2000 experiments in a reference library of nearly 550 data sets retrieved from public archives and systematically curated. To make the underlying data points more accessible to bench biologists, we classified experimental small molecules and gene manipulations into signaling pathways and experimental tissues and cell lines into physiological systems and organs. Incorporation of these mappings into Transcriptomine enables the user to readily evaluate tissue-specific regulation of gene expression by nuclear receptor signaling pathways. Data points from animal and cell model experiments and from clinical data sets elucidate the roles of nuclear receptor pathways in gene expression events accompanying various normal and pathological cellular processes. In addition, data sets targeting non-nuclear receptor signaling pathways highlight transcriptional cross-talk between nuclear receptors and other signaling pathways. We demonstrate with specific examples how data points that exist in isolation in individual data sets validate each other when connected and made accessible to the user in a single interface. In summary, Transcriptomine allows bench biologists to routinely develop research hypotheses, validate experimental data, or model relationships between signaling pathways, genes, and tissues. Copyright © 2017, American Association for the Advancement of Science.

  12. Mapping the polysaccharide degradation potential of Aspergillus niger.

    PubMed

    Andersen, Mikael R; Giese, Malene; de Vries, Ronald P; Nielsen, Jens

    2012-07-16

    The degradation of plant materials by enzymes is an industry of increasing importance. For sustainable production of second generation biofuels and other products of industrial biotechnology, efficient degradation of non-edible plant polysaccharides such as hemicellulose is required. For each type of hemicellulose, a complex mixture of enzymes is required for complete conversion to fermentable monosaccharides. In plant-biomass degrading fungi, these enzymes are regulated and released by complex regulatory structures. In this study, we present a methodology for evaluating the potential of a given fungus for polysaccharide degradation. Through the compilation of information from 203 articles, we have systematized knowledge on the structure and degradation of 16 major types of plant polysaccharides to form a graphical overview. As a case example, we have combined this with a list of 188 genes coding for carbohydrate-active enzymes from Aspergillus niger, thus forming an analysis framework, which can be queried. Combination of this information network with gene expression analysis on mono- and polysaccharide substrates has allowed elucidation of concerted gene expression from this organism. One such example is the identification of a full set of extracellular polysaccharide-acting genes for the degradation of oat spelt xylan. The mapping of plant polysaccharide structures along with the corresponding enzymatic activities is a powerful framework for expression analysis of carbohydrate-active enzymes. Applying this network-based approach, we provide the first genome-scale characterization of all genes coding for carbohydrate-active enzymes identified in A. niger.

  13. Emodin enhances the chemosensitivity of endometrial cancer by inhibiting ROS-mediated Cisplatin-resistance.

    PubMed

    Ding, Ning; Zhang, Hong; Su, Shan; Ding, Yumei; Yu, Xiaohui; Tang, Yujie; Wang, Qingfang; Liu, Peishu

    2017-12-18

    Background Endometrial cancer is a common cause of death in gynecological malignancies. Cisplatin is a clinically chemotherapeutic agent. However, drug-resistance is the primary cause of treatment failure. Objective Emodin is commonly used clinically to increase the sensitivity of chemotherapeutic agents, yet whether Emodin promotes the role of Cisplatin in the treatment of endometrial cancer has not been studied. Method CCK-8 kit was utilized to determine the growth of two endometrial cancer cell lines, Ishikawa and HEC-IB. The apoptosis level of Ishikawa and HEC-IB cells was detected by Annexin V / propidium iodide double-staining assay. ROS level was detected by DCFH-DA and NADPH oxidase expression. Expressions of drug-resistant genes were examined by real-time PCR and Western blotting. Results Emodin combined with Cisplatin reduced cell growth and increased the apoptosis of endometrial cancer cells. Co-treatment of Emodin and Cisplatin increased chemosensitivity by inhibiting the expression of drug-resistant genes through reducing the ROS levels in endometrial cancer cells. In an endometrial cancer xenograft murine model, the tumor size was reduced and animal survival time was increased by co-treatment of Emodin and Cisplatin. Conclusion This study demonstrates that Emodin enhances the chemosensitivity of Cisplatin on endometrial cancer by inhibiting ROS-mediated expression of drug-resistance genes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. High dimensional biological data retrieval optimization with NoSQL technology.

    PubMed

    Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike

    2014-01-01

    High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.

  15. High dimensional biological data retrieval optimization with NoSQL technology

    PubMed Central

    2014-01-01

    Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347

  16. -A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome.

    PubMed

    Mackeh, Rafah; Boughorbel, Sabri; Chaussabel, Damien; Kino, Tomoshige

    2017-01-01

    The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB), a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp.

  17. ­A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome

    PubMed Central

    Mackeh, Rafah; Boughorbel, Sabri; Chaussabel, Damien; Kino, Tomoshige

    2017-01-01

    The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB), a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp. PMID:28413616

  18. Liverome: a curated database of liver cancer-related gene signatures with self-contained context information.

    PubMed

    Lee, Langho; Wang, Kai; Li, Gang; Xie, Zhi; Wang, Yuli; Xu, Jiangchun; Sun, Shaoxian; Pocalyko, David; Bhak, Jong; Kim, Chulhong; Lee, Kee-Ho; Jang, Ye Jin; Yeom, Young Il; Yoo, Hyang-Sook; Hwang, Seungwoo

    2011-11-30

    Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide. A number of molecular profiling studies have investigated the changes in gene and protein expression that are associated with various clinicopathological characteristics of HCC and generated a wealth of scattered information, usually in the form of gene signature tables. A database of the published HCC gene signatures would be useful to liver cancer researchers seeking to retrieve existing differential expression information on a candidate gene and to make comparisons between signatures for prioritization of common genes. A challenge in constructing such database is that a direct import of the signatures as appeared in articles would lead to a loss or ambiguity of their context information that is essential for a correct biological interpretation of a gene's expression change. This challenge arises because designation of compared sample groups is most often abbreviated, ad hoc, or even missing from published signature tables. Without manual curation, the context information becomes lost, leading to uninformative database contents. Although several databases of gene signatures are available, none of them contains informative form of signatures nor shows comprehensive coverage on liver cancer. Thus we constructed Liverome, a curated database of liver cancer-related gene signatures with self-contained context information. Liverome's data coverage is more than three times larger than any other signature database, consisting of 143 signatures taken from 98 HCC studies, mostly microarray and proteome, and involving 6,927 genes. The signatures were post-processed into an informative and uniform representation and annotated with an itemized summary so that all context information is unambiguously self-contained within the database. The signatures were further informatively named and meaningfully organized according to ten functional categories for guided browsing. Its web interface enables a straightforward retrieval of known differential expression information on a query gene and a comparison of signatures to prioritize common genes. The utility of Liverome-collected data is shown by case studies in which useful biological insights on HCC are produced. Liverome database provides a comprehensive collection of well-curated HCC gene signatures and straightforward interfaces for gene search and signature comparison as well. Liverome is available at http://liverome.kobic.re.kr.

  19. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  20. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  1. MicroRNAs expression in ox-LDL treated HUVECs: MiR-365 modulates apoptosis and Bcl-2 expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qin, Bing; Xiao, Bo; Liang, Desheng

    Highlights: {yields} We evaluated the role of miRNAs in ox-LDL induced apoptosis in ECs. {yields} We found 4 up-regulated and 11 down-regulated miRNAs in apoptotic ECs. {yields} Target genes of the dysregulated miRNAs regulate ECs apoptosis and atherosclerosis. {yields} MiR-365 promotes ECs apoptosis via suppressing Bcl-2 expression. {yields} MiR-365 inhibitor alleviates ECs apoptosis induced by ox-LDL. -- Abstract: Endothelial cells (ECs) apoptosis induced by oxidized low-density lipoprotein (ox-LDL) is thought to play a critical role in atherosclerosis. MicroRNAs (miRNAs) are a class of noncoding RNAs that posttranscriptionally regulate the expression of genes involved in diverse cell functions, including differentiation, growth,more » proliferation, and apoptosis. However, whether miRNAs are associated with ox-LDL induced apoptosis and their effect on ECs is still unknown. Therefore, this study evaluated potential miRNAs and their involvement in ECs apoptosis in response to ox-LDL stimulation. Microarray and qRT-PCR analysis performed on human umbilical vein endothelial cells (HUVECs) exposed to ox-LDL identified 15 differentially expressed (4 up- and 11 down-regulated) miRNAs. Web-based query tools were utilized to predict the target genes of the differentially expressed miRNAs, and the potential target genes were classified into different function categories with the gene ontology (GO) term and KEGG pathway annotation. In particular, bioinformatics analysis suggested that anti-apoptotic protein B-cell CLL/lymphoma 2 (Bcl-2) is a target gene of miR-365, an apoptomir up-regulated by ox-LDL stimulation in HUVECs. We further showed that transfection of miR-365 inhibitor partly restored Bcl-2 expression at both mRNA and protein levels, leading to a reduction of ox-LDL-mediated apoptosis in HUVECs. Taken together, our findings indicate that miRNAs participate in ox-LDL-mediated apoptosis in HUVECs. MiR-365 potentiates ox-LDL-induced ECs apoptosis by regulating the expression of Bcl-2, suggesting potential novel therapeutic targets for atherosclerosis.« less

  2. EnsMart: A Generic System for Fast and Flexible Access to Biological Data

    PubMed Central

    Kasprzyk, Arek; Keefe, Damian; Smedley, Damian; London, Darin; Spooner, William; Melsopp, Craig; Hammond, Martin; Rocca-Serra, Philippe; Cox, Tony; Birney, Ewan

    2004-01-01

    The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to `non-Ensembl' data sets. PMID:14707178

  3. The Human CHRNA7 and CHRFAM7A Genes: A Review of the Genetics, Regulation, and Function

    PubMed Central

    Sinkus, Melissa L.; Graw, Sharon; Freedman, Robert; Ross, Randal G.; Lester, Henry A.; Leonard, Sherry

    2015-01-01

    The human α7 neuronal nicotinic acetylcholine receptor gene (CHRNA7) is ubiquitously expressed in both the central nervous system and in the periphery. CHRNA7 is genetically linked to multiple disorders with cognitive deficits, including schizophrenia, bipolar disorder, ADHD, epilepsy, Alzheimer’s disease, and Rett syndrome. The regulation of CHRNA7 is complex; more than a dozen mechanisms are known, one of which is a partial duplication of the parent gene. Exons 5-10 of CHRNA7 on chromosome 15 were duplicated and inserted 1.6 Mb upstream of CHRNA7, interrupting an earlier partial duplication of two other genes. The chimeric CHRFAM7A gene product, dupα7, assembles with α7 subunits, resulting in a dominant negative regulation of function. The duplication is human specific, occurring neither in primates nor in rodents. The duplicated α7 sequence in exons 5-10 of CHRFAM7A is almost identical to CHRNA7, and thus is not completely queried in high throughput genetic studies (GWAS). Further, pre-clinical animal models of the α7nAChR utilized in drug development research do not have CHRFAM7A (dupα7) and cannot fully model human drug responses. The wide expression of CHRNA7, its multiple functions and modes of regulation present challenges for study of this gene in disease. PMID:25701707

  4. An Integrative data mining approach to identifying Adverse ...

    EPA Pesticide Factsheets

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP

  5. Walter User’s Manual (Version 1.0).

    DTIC Science & Technology

    1987-09-01

    queries and/or commands. 1.2 - How Walter Uses the Screen As shown in Figure 1-1, Walter divides the screen of your terminal into five separate areas...our attention to queries and how to submit them to the database. 1.3.1 - Submitting Queries A query is an expression consisting of words, parentheses...dates, but also with ranges of dates, such as "oct 15 : nov 15". Waiter recognizes three kinds of dates: * Specific dates of the form [date <month> <day

  6. What Is Spatio-Temporal Data Warehousing?

    NASA Astrophysics Data System (ADS)

    Vaisman, Alejandro; Zimányi, Esteban

    In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.

  7. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  8. EviNet: a web platform for network enrichment analysis with flexible definition of gene sets.

    PubMed

    Jeggari, Ashwini; Alekseenko, Zhanna; Petrov, Iurii; Dias, José M; Ericson, Johan; Alexeyenko, Andrey

    2018-06-09

    The new web resource EviNet provides an easily run interface to network enrichment analysis for exploration of novel, experimentally defined gene sets. The major advantages of this analysis are (i) applicability to any genes found in the global network rather than only to those with pathway/ontology term annotations, (ii) ability to connect genes via different molecular mechanisms rather than within one high-throughput platform, and (iii) statistical power sufficient to detect enrichment of very small sets, down to individual genes. The users' gene sets are either defined prior to upload or derived interactively from an uploaded file by differential expression criteria. The pathways and networks used in the analysis can be chosen from the collection menu. The calculation is typically done within seconds or minutes and the stable URL is provided immediately. The results are presented in both visual (network graphs) and tabular formats using jQuery libraries. Uploaded data and analysis results are kept in separated project directories not accessible by other users. EviNet is available at https://www.evinet.org/.

  9. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

    PubMed Central

    2010-01-01

    Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu. PMID:20946609

  10. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

    PubMed

    Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

    2010-10-07

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.

  11. Characterizing the Grape Transcriptome. Analysis of Expressed Sequence Tags from Multiple Vitis Species and Development of a Compendium of Gene Expression during Berry Development1[w

    PubMed Central

    Silva, Francisco Goes da; Iandolino, Alberto; Al-Kayal, Fadi; Bohlmann, Marlene C.; Cushman, Mary Ann; Lim, Hyunju; Ergul, Ali; Figueroa, Rubi; Kabuloglu, Elif K.; Osborne, Craig; Rowe, Joan; Tattersall, Elizabeth; Leslie, Anna; Xu, Jane; Baek, JongMin; Cramer, Grant R.; Cushman, John C.; Cook, Douglas R.

    2005-01-01

    We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of ≥98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation to previously characterized aspects of berry development and physiology. Comparison with published results for select genes, as well as correlation analysis between independent data sets, suggests that the inferred in silico patterns of expression are likely to be an accurate representation of transcript abundance for the conditions surveyed. Thus, the combined data set reveals the in silico expression patterns for hundreds of genes in V. vinifera, the majority of which have not been previously studied within this species. PMID:16219919

  12. Alterations of immune response of non-small cell lung cancer with Azacytidine

    PubMed Central

    Easwaran, Hariharan; Mohammad, Helai P.; Vendetti, Frank; VanCriekinge, Wim; DeMeyer, Tim; Du, Zhengzong; Parsana, Princy; Rodgers, Kristen; Yen, Ray-Whay; Zahnow, Cynthia A.; Taube, Janis M.; Brahmer, Julie R.; Tykodi, Scott S.; Easton, Keith; Carvajal, Richard D.; Jones, Peter A.; Laird, Peter W.; Weisenberger, Daniel J.; Tsai, Salina; Juergens, Rosalyn A.; Topalian, Suzanne L.; Rudin, Charles M.; Brock, Malcolm V.; Pardoll, Drew; Baylin, Stephen B.

    2013-01-01

    Innovative therapies are needed for advanced Non-Small Cell Lung Cancer (NSCLC). We have undertaken a genomics based, hypothesis driving, approach to query an emerging potential that epigenetic therapy may sensitize to immune checkpoint therapy targeting PD-L1/PD-1 interaction. NSCLC cell lines were treated with the DNA hypomethylating agent azacytidine (AZA – Vidaza) and genes and pathways altered were mapped by genome-wide expression and DNA methylation analyses. AZA-induced pathways were analyzed in The Cancer Genome Atlas (TCGA) project by mapping the derived gene signatures in hundreds of lung adeno (LUAD) and squamous cell carcinoma (LUSC) samples. AZA up-regulates genes and pathways related to both innate and adaptive immunity and genes related to immune evasion in a several NSCLC lines. DNA hypermethylation and low expression of IRF7, an interferon transcription factor, tracks with this signature particularly in LUSC. In concert with these events, AZA up-regulates PD-L1 transcripts and protein, a key ligand-mediator of immune tolerance. Analysis of TCGA samples demonstrates that a significant proportion of primary NSCLC have low expression of AZA-induced immune genes, including PD-L1. We hypothesize that epigenetic therapy combined with blockade of immune checkpoints – in particular the PD-1/PD-L1 pathway – may augment response of NSCLC by shifting the balance between immune activation and immune inhibition, particularly in a subset of NSCLC with low expression of these pathways. Our studies define a biomarker strategy for response in a recently initiated trial to examine the potential of epigenetic therapy to sensitize patients with NSCLC to PD-1 immune checkpoint blockade. PMID:24162015

  13. Rice SNP-seek database update: new SNPs, indels, and queries.

    PubMed

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

    2017-01-04

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. An adaptable architecture for patient cohort identification from diverse data sources.

    PubMed

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-12-01

    We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.

  15. A Genome-Wide Association Study for Culm Cellulose Content in Barley Reveals Candidate Genes Co-Expressed with Members of the CELLULOSE SYNTHASE A Gene Family

    PubMed Central

    Houston, Kelly; Burton, Rachel A.; Sznajder, Beata; Rafalski, Antoni J.; Dhugga, Kanwarpal S.; Mather, Diane E.; Taylor, Jillian; Steffenson, Brian J.; Waugh, Robbie; Fincher, Geoffrey B.

    2015-01-01

    Cellulose is a fundamentally important component of cell walls of higher plants. It provides a scaffold that allows the development and growth of the plant to occur in an ordered fashion. Cellulose also provides mechanical strength, which is crucial for both normal development and to enable the plant to withstand both abiotic and biotic stresses. We quantified the cellulose concentration in the culm of 288 two – rowed and 288 six – rowed spring type barley accessions that were part of the USDA funded barley Coordinated Agricultural Project (CAP) program in the USA. When the population structure of these accessions was analysed we identified six distinct populations, four of which we considered to be comprised of a sufficient number of accessions to be suitable for genome-wide association studies (GWAS). These lines had been genotyped with 3072 SNPs so we combined the trait and genetic data to carry out GWAS. The analysis allowed us to identify regions of the genome containing significant associations between molecular markers and cellulose concentration data, including one region cross-validated in multiple populations. To identify candidate genes we assembled the gene content of these regions and used these to query a comprehensive RNA-seq based gene expression atlas. This provided us with gene annotations and associated expression data across multiple tissues, which allowed us to formulate a supported list of candidate genes that regulate cellulose biosynthesis. Several regions identified by our analysis contain genes that are co-expressed with CELLULOSE SYNTHASE A (HvCesA) across a range of tissues and developmental stages. These genes are involved in both primary and secondary cell wall development. In addition, genes that have been previously linked with cellulose synthesis by biochemical methods, such as HvCOBRA, a gene of unknown function, were also associated with cellulose levels in the association panel. Our analyses provide new insights into the genes that contribute to cellulose content in cereal culms and to a greater understanding of the interactions between them. PMID:26154104

  16. TransAtlasDB: an integrated database connecting expression data, metadata and variants

    PubMed Central

    Adetunji, Modupeore O; Lamont, Susan J; Schmidt, Carl J

    2018-01-01

    Abstract High-throughput transcriptome sequencing (RNAseq) is the universally applied method for target-free transcript identification and gene expression quantification, generating huge amounts of data. The constraint of accessing such data and interpreting results can be a major impediment in postulating suitable hypothesis, thus an innovative storage solution that addresses these limitations, such as hard disk storage requirements, efficiency and reproducibility are paramount. By offering a uniform data storage and retrieval mechanism, various data can be compared and easily investigated. We present a sophisticated system, TransAtlasDB, which incorporates a hybrid architecture of both relational and NoSQL databases for fast and efficient data storage, processing and querying of large datasets from transcript expression analysis with corresponding metadata, as well as gene-associated variants (such as SNPs) and their predicted gene effects. TransAtlasDB provides the data model of accurate storage of the large amount of data derived from RNAseq analysis and also methods of interacting with the database, either via the command-line data management workflows, written in Perl, with useful functionalities that simplifies the complexity of data storage and possibly manipulation of the massive amounts of data generated from RNAseq analysis or through the web interface. The database application is currently modeled to handle analyses data from agricultural species, and will be expanded to include more species groups. Overall TransAtlasDB aims to serve as an accessible repository for the large complex results data files derived from RNAseq gene expression profiling and variant analysis. Database URL: https://modupeore.github.io/TransAtlasDB/ PMID:29688361

  17. BIOSPIDA: A Relational Database Translator for NCBI.

    PubMed

    Hagen, Matthew S; Lee, Eva K

    2010-11-13

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.

  18. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  19. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, themore » ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.« less

  20. An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.

    PubMed

    Zhang, C; Crasta, O; Cammer, S; Will, R; Kenyon, R; Sullivan, D; Yu, Q; Sun, W; Jha, R; Liu, D; Xue, T; Zhang, Y; Moore, M; McGarvey, P; Huang, H; Chen, Y; Zhang, J; Mazumder, R; Wu, C; Sobral, B

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host-pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/.

  1. Regular paths in SparQL: querying the NCI Thesaurus.

    PubMed

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  2. [Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].

    PubMed

    Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu

    2015-09-01

    By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.

  3. Warehousing re-annotated cancer genes for biomarker meta-analysis.

    PubMed

    Orsini, M; Travaglione, A; Capobianco, E

    2013-07-01

    Translational research in cancer genomics assigns a fundamental role to bioinformatics in support of candidate gene prioritization with regard to both biomarker discovery and target identification for drug development. Efforts in both such directions rely on the existence and constant update of large repositories of gene expression data and omics records obtained from a variety of experiments. Users who interactively interrogate such repositories may have problems in retrieving sample fields that present limited associated information, due for instance to incomplete entries or sometimes unusable files. Cancer-specific data sources present similar problems. Given that source integration usually improves data quality, one of the objectives is keeping the computational complexity sufficiently low to allow an optimal assimilation and mining of all the information. In particular, the scope of integrating intraomics data can be to improve the exploration of gene co-expression landscapes, while the scope of integrating interomics sources can be that of establishing genotype-phenotype associations. Both integrations are relevant to cancer biomarker meta-analysis, as the proposed study demonstrates. Our approach is based on re-annotating cancer-specific data available at the EBI's ArrayExpress repository and building a data warehouse aimed to biomarker discovery and validation studies. Cancer genes are organized by tissue with biomedical and clinical evidences combined to increase reproducibility and consistency of results. For better comparative evaluation, multiple queries have been designed to efficiently address all types of experiments and platforms, and allow for retrieval of sample-related information, such as cell line, disease state and clinical aspects. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Conceptual mapping of user's queries to medical subject headings.

    PubMed Central

    Zieman, Y. L.; Bleich, H. L.

    1997-01-01

    This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680

  5. The Hematopoietic Expression Viewer: expanding mobile apps as a scientific tool.

    PubMed

    James, Regis A; Rao, Mitchell M; Chen, Edward S; Goodell, Margaret A; Shaw, Chad A

    2012-07-15

    Many important data in current biological science comprise hundreds, thousands or more individual results. These massive data require computational tools to navigate results and effectively interact with the content. Mobile device apps are an increasingly important tool in the everyday lives of scientists and non-scientists alike. These software present individuals with compact and efficient tools to interact with complex data at meetings or other locations remote from their main computing environment. We believe that apps will be important tools for biologists, geneticists and physicians to review content while participating in biomedical research or practicing medicine. We have developed a prototype app for displaying gene expression data using the iOS platform. To present the software engineering requirements, we review the model-view-controller schema for Apple's iOS. We apply this schema to a simple app for querying locally developed microarray gene expression data. The challenge of this application is to balance between storing content locally within the app versus obtaining it dynamically via a network connection. The Hematopoietic Expression Viewer is available at http://www.shawlab.org/he_viewer. The source code for this project and any future information on how to obtain the app can be accessed at http://www.shawlab.org/he_viewer.

  6. Transcriptome Analysis of Differentially Expressed Genes Provides Insight into Stolon Formation in Tulipa edulis

    PubMed Central

    Miao, Yuanyuan; Zhu, Zaibiao; Guo, Qiaosheng; Zhu, Yunhao; Yang, Xiaohua; Sun, Yuan

    2016-01-01

    Tulipa edulis (Miq.) Baker is an important medicinal plant with a variety of anti-cancer properties. The stolon is one of the main asexual reproductive organs of T. edulis and possesses a unique morphology. To explore the molecular mechanism of stolon formation, we performed an RNA-seq analysis of the transcriptomes of stolons at three developmental stages. In the present study, 15.49 Gb of raw data were generated and assembled into 74,006 unigenes, and a total of 2,811 simple sequence repeats were detected in T. edulis. Among the three libraries of stolons at different developmental stages, there were 5,119 differentially expressed genes (DEGs). A functional annotation analysis based on sequence similarity queries of the GO, COG, KEGG databases showed that these DEGs were mainly involved in many physiological and biochemical processes, such as material and energy metabolism, hormone signaling, cell growth, and transcription regulation. In addition, quantitative real-time PCR analysis revealed that the expression patterns of the DEGs were consistent with the transcriptome data, which further supported a role for the DEGs in stolon formation. This study provides novel resources for future genetic and molecular studies in T. edulis. PMID:27064558

  7. Transcriptome Analysis of Differentially Expressed Genes Provides Insight into Stolon Formation in Tulipa edulis.

    PubMed

    Miao, Yuanyuan; Zhu, Zaibiao; Guo, Qiaosheng; Zhu, Yunhao; Yang, Xiaohua; Sun, Yuan

    2016-01-01

    Tulipa edulis (Miq.) Baker is an important medicinal plant with a variety of anti-cancer properties. The stolon is one of the main asexual reproductive organs of T. edulis and possesses a unique morphology. To explore the molecular mechanism of stolon formation, we performed an RNA-seq analysis of the transcriptomes of stolons at three developmental stages. In the present study, 15.49 Gb of raw data were generated and assembled into 74,006 unigenes, and a total of 2,811 simple sequence repeats were detected in T. edulis. Among the three libraries of stolons at different developmental stages, there were 5,119 differentially expressed genes (DEGs). A functional annotation analysis based on sequence similarity queries of the GO, COG, KEGG databases showed that these DEGs were mainly involved in many physiological and biochemical processes, such as material and energy metabolism, hormone signaling, cell growth, and transcription regulation. In addition, quantitative real-time PCR analysis revealed that the expression patterns of the DEGs were consistent with the transcriptome data, which further supported a role for the DEGs in stolon formation. This study provides novel resources for future genetic and molecular studies in T. edulis.

  8. SiNoPsis: Single Nucleotide Polymorphisms selection and promoter profiling.

    PubMed

    Boloc, Daniel; Rodríguez, Natalia; Gassó, Patricia; Abril, Josep F; Bernardo, Miquel; Lafuente, Amalia; Mas, Sergi

    2017-09-14

    The selection of a Single Nucleotide Polymorphism (SNP) using bibliographic methods can be a very time-consuming task. Moreover, a SNP selected in this way may not be easily visualized in its genomic context by a standard user hoping to correlate it with other valuable information. Here we propose a web form built on top of Circos that can assist SNP-centred screening, based on their location in the genome and the regulatory modules they can disrupt. Its use may allow researchers to prioritize SNPs in genotyping and disease studies. SiNoPsis is bundled as a web portal. It focuses on the different structures involved in the genomic expression of a gene, especially those found in the core promoter upstream region. These structures include transcription factor binding sites (for promoter and enhancer signals), histones, and promoter flanking regions. Additionally, the tool provides eQTL and linkage disequilibrium (LD) properties for a given SNP query, yielding further clues about other indirectly associated SNPs. Possible disruptions of the aforementioned structures affecting gene transcription are reported using multiple resource databases. SiNoPsis has a simple user-friendly interface, which allows single queries by gene symbol, genomic coordinates, Ensembl gene identifiers, RefSeq transcript identifiers and SNPs. It is the only portal providing useful SNP selection based on regulatory modules and LD with functional variants in both textual and graphic modes (by properly defining the arguments and parameters needed to run Circos). SiNoPsis is freely available at https://compgen.bio.ub.edu/SiNoPsis /. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  9. KERIS: kaleidoscope of gene responses to inflammation between species

    PubMed Central

    Li, Peng; Tompkins, Ronald G; Xiao, Wenzhong

    2017-01-01

    A cornerstone of modern biomedical research is the use of animal models to study disease mechanisms and to develop new therapeutic approaches. In order to help the research community to better explore the similarities and differences of genomic response between human inflammatory diseases and murine models, we developed KERIS: kaleidoscope of gene responses to inflammation between species (available at http://www.igenomed.org/keris/). As of June 2016, KERIS includes comparisons of the genomic response of six human inflammatory diseases (burns, trauma, infection, sepsis, endotoxin and acute respiratory distress syndrome) and matched mouse models, using 2257 curated samples from the Inflammation and the Host Response to Injury Glue Grant studies and other representative studies in Gene Expression Omnibus. A researcher can browse, query, visualize and compare the response patterns of genes, pathways and functional modules across different diseases and corresponding murine models. The database is expected to help biologists choosing models when studying the mechanisms of particular genes and pathways in a disease and prioritizing the translation of findings from disease models into clinical studies. PMID:27789704

  10. Sustained ELABELA Gene Therapy in High-salt Diet-induced Hypertensive Rats.

    PubMed

    Schreiber, Claire A; Holditch, Sara J; Generous, Alex; Ikeda, Yasuhiro

    2017-01-01

    Elabela (ELA) is a recently identified apelin receptor agonist essential for cardiac development, but its biology and therapeutic potential are unclear. In humans, ELA transcripts are detected in embryonic stem cells, induced pluripotent stem cells, kidney, heart and blood vessels. ELA through the apelin (APJ) receptor promotes angiogenesis in vitro, relaxes murine aortic blood vessels and attenuates high blood pressure in vivo. The APJ receptor when bound to its original ligand, apelin, exerts peripheral vasodilatory and positive inotropic effects, conferring cardioprotection in vivo. This study initially assessed endogenous ELA expression in normal and diseased rats and then characterized the effects of long-term ELA gene delivery by adeno-associated virus serotype 9 (AAV9) vectors on cardiorenal function in Dahl salt-sensitive rats (DS) on a high-salt diet over 3 months. Endogenous ELA was predominantly expressed in the kidneys, especially in the renal collecting duct cells and was not affected by disease. Rat ELA was overexpressed in the heart via AAV9 vector by a single intravenous injection. ELA-treated animals showed delayed onset of blood pressure elevation. Prior to high-salt diet, a reduction in the fractional sodium and chloride excretion was observed in rats given the AAV9-ELA vector. After three months on a high-salt diet, ELA preserved glomerular architecture, decreased renal fibrosis and suppressed expression of fibrosis-associated genes in the kidneys. ELA is constitutively expressed in renal collecting ducts in rats. Sustained AAV-ELA expression may offer a potential long-term therapy for hypertension and renal remodeling. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. An adaptable architecture for patient cohort identification from diverse data sources

    PubMed Central

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-01-01

    Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442

  12. A Gene Ontology Tutorial in Python.

    PubMed

    Vesztrocy, Alex Warwick; Dessimoz, Christophe

    2017-01-01

    This chapter is a tutorial on using Gene Ontology resources in the Python programming language. This entails querying the Gene Ontology graph, retrieving Gene Ontology annotations, performing gene enrichment analyses, and computing basic semantic similarity between GO terms. An interactive version of the tutorial, including solutions, is available at http://gohandbook.org .

  13. GELLO: an object-oriented query and expression language for clinical decision support.

    PubMed

    Sordo, Margarita; Ogunyemi, Omolola; Boxwala, Aziz A; Greenes, Robert A

    2003-01-01

    GELLO is a purpose-specific, object-oriented (OO) query and expression language. GELLO is the result of a concerted effort of the Decision Systems Group (DSG) working with the HL7 Clinical Decision Support Technical Committee (CDSTC) to provide the HL7 community with a common format for data encoding and manipulation. GELLO will soon be submitted for ballot to the HL7 CDSTC for consideration as a standard.

  14. Transparent mediation-based access to multiple yeast data sources using an ontology driven interface.

    PubMed

    Briache, Abdelaali; Marrakchi, Kamar; Kerzazi, Amine; Navas-Delgado, Ismael; Rossi Hassani, Badr D; Lairini, Khalid; Aldana-Montes, José F

    2012-01-25

    Saccharomyces cerevisiae is recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Information solicited by scientists on its biological entities (Proteins, Genes, RNAs...) is scattered within several data sources like SGD, Yeastract, CYGD-MIPS, BioGrid, PhosphoGrid, etc. Because of the heterogeneity of these sources, querying them separately and then manually combining the returned results is a complex and time-consuming task for biologists most of whom are not bioinformatics expert. It also reduces and limits the use that can be made on the available data. To provide transparent and simultaneous access to yeast sources, we have developed YeastMed: an XML and mediator-based system. In this paper, we present our approach in developing this system which takes advantage of SB-KOM to perform the query transformation needed and a set of Data Services to reach the integrated data sources. The system is composed of a set of modules that depend heavily on XML and Semantic Web technologies. User queries are expressed in terms of a domain ontology through a simple form-based web interface. YeastMed is the first mediation-based system specific for integrating yeast data sources. It was conceived mainly to help biologists to find simultaneously relevant data from multiple data sources. It has a biologist-friendly interface easy to use. The system is available at http://www.khaos.uma.es/yeastmed/.

  15. A Framework for Building and Reasoning with Adaptive and Interoperable PMESII Models

    DTIC Science & Technology

    2007-11-01

    Description Logic SOA Service Oriented Architecture SPARQL Simple Protocol And RDF Query Language SQL Standard Query Language SROM Stability and...another by providing a more expressive ontological structure for one of the models, e.g., semantic networks can be mapped to first- order logical...Pellet is an open-source reasoner that works with OWL-DL. It accepts the SPARQL protocol and RDF query language ( SPARQL ) and provides a Java API to

  16. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

  17. Occam's razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2005-01-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  18. Occam"s razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2004-12-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  19. BIG: a large-scale data integration tool for renal physiology.

    PubMed

    Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya; Knepper, Mark A

    2016-10-01

    Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: "How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?" This is the type of problem that has motivated the "Big-Data" revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/.

  20. BIOSPIDA: A Relational Database Translator for NCBI

    PubMed Central

    Hagen, Matthew S.; Lee, Eva K.

    2010-01-01

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013

  1. Math expression retrieval using an inverted index over symbol pairs

    NASA Astrophysics Data System (ADS)

    Stalnaker, David; Zanibbi, Richard

    2015-01-01

    We introduce a new method for indexing and retrieving mathematical expressions, and a new protocol for evaluating math formula retrieval systems. The Tangent search engine uses an inverted index over pairs of symbols in math expressions. Each key in the index is a pair of symbols along with their relative distance and vertical displacement within an expression. Matched expressions are ranked by the harmonic mean of the percentage of symbol pairs matched in the query, and the percentage of symbol pairs matched in the candidate expression. We have found that our method is fast enough for use in real time and finds partial matches well, such as when subexpressions are re-arranged (e.g. expressions moved from the left to the right of an equals sign) or when individual symbols (e.g. variables) differ from a query expression. In an experiment using expressions from English Wikipedia, student and faculty participants (N=20) found expressions returned by Tangent significantly more similar than those from a text-based retrieval system (Lucene) adapted for mathematical expressions. Participants provided similarity ratings using a 5-point Likert scale, evaluating expressions from both algorithms one-at-a-time in a randomized order to avoid bias from the position of hits in search result lists. For the Lucene-based system, precision for the top 1 and 10 hits averaged 60% and 39% across queries respectively, while for Tangent mean precision at 1 and 10 were 99% and 60%. A demonstration and source code are publicly available.

  2. AntiHunter 2.0: increased speed and sensitivity in searching BLAST output for EST antisense transcripts.

    PubMed

    Lavorgna, Giovanni; Triunfo, Riccardo; Santoni, Federico; Orfanelli, Ugo; Noci, Sara; Bulfone, Alessandro; Zanetti, Gianluigi; Casari, Giorgio

    2005-07-01

    An increasing number of eukaryotic and prokaryotic genes are being found to have natural antisense transcripts (NATs). There is also growing evidence to suggest that antisense transcription could play a key role in many human diseases. Consequently, there have been several recent attempts to set up computational procedures aimed at identifying novel NATs. Our group has developed the AntiHunter program for the identification of expressed sequence tag (EST) antisense transcripts from BLAST output. In order to perform an analysis, the program requires a genomic sequence plus an associated list of transcript names and coordinates of the genomic region. After masking the repeated regions, the program carries out a BLASTN search of this sequence in the selected EST database, reporting via email the EST entries that reveal an antisense transcript according to the user-supplied list. Here, we present the newly developed version 2.0 of the AntiHunter tool. Several improvements have been added to this version of the program in order to increase its ability to detect a larger number of antisense ESTs. As a result, AntiHunter can now detect, on average, >45% more antisense ESTs with little or no increase in the percentage of the false positives. We also raised the maximum query size to 3 Mb (previously 1 Mb). Moreover, we found that a reasonable trade-off between the program search sensitivity and the maximum allowed size of the input-query sequence could be obtained by querying the database with the MEGABLAST program, rather than by using the BLAST one. We now offer this new opportunity to users, i.e. if choosing the MEGABLAST option, users can input a query sequence up to 30 Mb long, thus considerably improving the possibility to analyze longer query regions. The AntiHunter tool is freely available at http://bioinfo.crs4.it/AH2.0.

  3. QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks.

    PubMed

    Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu

    2016-06-01

    Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. QuIN's web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.

  4. ESTree db: a Tool for Peach Functional Genomics

    PubMed Central

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

    2005-01-01

    Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742

  5. ESTree db: a tool for peach functional genomics.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

    2005-12-01

    The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.

  6. L1000CDS2: LINCS L1000 characteristic direction signatures search engine.

    PubMed

    Duan, Qiaonan; Reid, St Patrick; Clark, Neil R; Wang, Zichen; Fernandez, Nicolas F; Rouillard, Andrew D; Readhead, Ben; Tritsch, Sarah R; Hodos, Rachel; Hafner, Marc; Niepel, Mario; Sorger, Peter K; Dudley, Joel T; Bavari, Sina; Panchal, Rekha G; Ma'ayan, Avi

    2016-01-01

    The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS 2 . The L1000CDS 2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS 2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS 2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS 2 , we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS 2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS 2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.

  7. NCBI GEO: archive for functional genomics data sets--update.

    PubMed

    Barrett, Tanya; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra

    2013-01-01

    The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

  8. CCDB: a curated database of genes involved in cervix cancer.

    PubMed

    Agarwal, Subhash M; Raghav, Dhwani; Singh, Harinder; Raghava, G P S

    2011-01-01

    The Cervical Cancer gene DataBase (CCDB, http://crdd.osdd.net/raghava/ccdb) is a manually curated catalog of experimentally validated genes that are thought, or are known to be involved in the different stages of cervical carcinogenesis. In spite of the large women population that is presently affected from this malignancy still at present, no database exists that catalogs information on genes associated with cervical cancer. Therefore, we have compiled 537 genes in CCDB that are linked with cervical cancer causation processes such as methylation, gene amplification, mutation, polymorphism and change in expression level, as evident from published literature. Each record contains details related to gene like architecture (exon-intron structure), location, function, sequences (mRNA/CDS/protein), ontology, interacting partners, homology to other eukaryotic genomes, structure and links to other public databases, thus augmenting CCDB with external data. Also, manually curated literature references have been provided to support the inclusion of the gene in the database and establish its association with cervix cancer. In addition, CCDB provides information on microRNA altered in cervical cancer as well as search facility for querying, several browse options and an online tool for sequence similarity search, thereby providing researchers with easy access to the latest information on genes involved in cervix cancer.

  9. Normalized Legal Drafting and the Query Method.

    ERIC Educational Resources Information Center

    Allen, Layman E.; Engholm, C. Rudy

    1978-01-01

    Normalized legal drafting, a mode of expressing ideas in legal documents so that the syntax that relates the constituent propositions is simplified and standardized, and the query method, a question-asking activity that teaches normalized drafting and provides practice, are examined. Some examples are presented. (JMD)

  10. Incremental Query Rewriting with Resolution

    NASA Astrophysics Data System (ADS)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  11. CRISPR-cas System as a Genome Engineering Platform: Applications in Biomedicine and Biotechnology.

    PubMed

    Hashemi, Atieh

    2018-01-01

    Genome editing mediated by Clustered Regularly Interspaced Palindromic Repeats (CRISPR) and its associated proteins (Cas) has recently been considered to be used as efficient, rapid and site-specific tool in the modification of endogenous genes in biomedically important cell types and whole organisms. It has become a predictable and precise method of choice for genome engineering by specifying a 20-nt targeting sequence within its guide RNA. Firstly, this review aims to describe the biology of CRISPR system. Next, the applications of CRISPR-Cas9 in various ways, such as efficient generation of a wide variety of biomedically important cellular models as well as those of animals, modifying epigenomes, conducting genome-wide screens, gene therapy, labelling specific genomic loci in living cells, metabolic engineering of yeast and bacteria and endogenous gene expression regulation by an altered version of this system were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Fragger: a protein fragment picker for structural queries.

    PubMed

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  13. A Fuzzy Query Mechanism for Human Resource Websites

    NASA Astrophysics Data System (ADS)

    Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

    Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.

  14. TabSQL: a MySQL tool to facilitate mapping user data to public databases.

    PubMed

    Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng

    2010-06-23

    With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.

  15. TabSQL: a MySQL tool to facilitate mapping user data to public databases

    PubMed Central

    2010-01-01

    Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251

  16. Nitrogen-responsive Regulation of GATA Protein Family Activators Gln3 and Gat1 Occurs by Two Distinct Pathways, One Inhibited by Rapamycin and the Other by Methionine Sulfoximine*

    PubMed Central

    Georis, Isabelle; Tate, Jennifer J.; Cooper, Terrance G.; Dubois, Evelyne

    2011-01-01

    Nitrogen availability regulates the transcription of genes required to degrade non-preferentially utilized nitrogen sources by governing the localization and function of transcription activators, Gln3 and Gat1. TorC1 inhibitor, rapamycin (Rap), and glutamine synthetase inhibitor, methionine sulfoximine (Msx), elicit responses grossly similar to those of limiting nitrogen, implicating both glutamine synthesis and TorC1 in the regulation of Gln3 and Gat1. To better understand this regulation, we compared Msx- versus Rap-elicited Gln3 and Gat1 localization, their DNA binding, nitrogen catabolite repression-sensitive gene expression, and the TorC1 pathway phosphatase requirements for these responses. Using this information we queried whether Rap and Msx inhibit sequential steps in a single, linear cascade connecting glutamine availability to Gln3 and Gat1 control as currently accepted or alternatively inhibit steps in two distinct parallel pathways. We find that Rap most strongly elicits nuclear Gat1 localization and expression of genes whose transcription is most Gat1-dependent. Msx, on the other hand, elicits nuclear Gln3 but not Gat1 localization and expression of genes that are most Gln3-dependent. Importantly, Rap-elicited nuclear Gln3 localization is absolutely Sit4-dependent, but that elicited by Msx is not. PP2A, although not always required for nuclear GATA factor localization, is highly required for GATA factor binding to nitrogen-responsive promoters and subsequent transcription irrespective of the gene GATA factor specificities. Collectively, our data support the existence of two different nitrogen-responsive regulatory pathways, one inhibited by Msx and the other by rapamycin. PMID:22039046

  17. Nitrogen-responsive regulation of GATA protein family activators Gln3 and Gat1 occurs by two distinct pathways, one inhibited by rapamycin and the other by methionine sulfoximine.

    PubMed

    Georis, Isabelle; Tate, Jennifer J; Cooper, Terrance G; Dubois, Evelyne

    2011-12-30

    Nitrogen availability regulates the transcription of genes required to degrade non-preferentially utilized nitrogen sources by governing the localization and function of transcription activators, Gln3 and Gat1. TorC1 inhibitor, rapamycin (Rap), and glutamine synthetase inhibitor, methionine sulfoximine (Msx), elicit responses grossly similar to those of limiting nitrogen, implicating both glutamine synthesis and TorC1 in the regulation of Gln3 and Gat1. To better understand this regulation, we compared Msx- versus Rap-elicited Gln3 and Gat1 localization, their DNA binding, nitrogen catabolite repression-sensitive gene expression, and the TorC1 pathway phosphatase requirements for these responses. Using this information we queried whether Rap and Msx inhibit sequential steps in a single, linear cascade connecting glutamine availability to Gln3 and Gat1 control as currently accepted or alternatively inhibit steps in two distinct parallel pathways. We find that Rap most strongly elicits nuclear Gat1 localization and expression of genes whose transcription is most Gat1-dependent. Msx, on the other hand, elicits nuclear Gln3 but not Gat1 localization and expression of genes that are most Gln3-dependent. Importantly, Rap-elicited nuclear Gln3 localization is absolutely Sit4-dependent, but that elicited by Msx is not. PP2A, although not always required for nuclear GATA factor localization, is highly required for GATA factor binding to nitrogen-responsive promoters and subsequent transcription irrespective of the gene GATA factor specificities. Collectively, our data support the existence of two different nitrogen-responsive regulatory pathways, one inhibited by Msx and the other by rapamycin.

  18. An emerging cyberinfrastructure for biodefense pathogen and pathogen–host data

    PubMed Central

    Zhang, C.; Crasta, O.; Cammer, S.; Will, R.; Kenyon, R.; Sullivan, D.; Yu, Q.; Sun, W.; Jha, R.; Liu, D.; Xue, T.; Zhang, Y.; Moore, M.; McGarvey, P.; Huang, H.; Chen, Y.; Zhang, J.; Mazumder, R.; Wu, C.; Sobral, B.

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host–pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/. PMID:17984082

  19. Met-Activating Genetically Improved Chimeric Factor-1 Promotes Angiogenesis and Hypertrophy in Adult Myogenesis.

    PubMed

    Ronzoni, Flavio; Ceccarelli, Gabriele; Perini, Ilaria; Benedetti, Laura; Galli, Daniela; Mulas, Francesca; Balli, Martina; Magenes, Giovanni; Bellazzi, Riccardo; De Angelis, Gabriella C; Sampaolesi, Maurilio

    2017-01-01

    Myogenic progenitor cells (activated satellite cells) are able to express both HGF and its receptor cMet. After muscle injury, HGF-Met stimulation promotes activation and primary division of satellite cells. MAGIC-F1 (Met-Activating Genetically Improved Chimeric Factor-1) is an engineered protein that contains two human Met-binding domains that promotes muscle hypertrophy. MAGIC-F1 protects myogenic precursors against apoptosis and increases their fusion ability enhancing muscle differentiation. Hemizygous and homozygous Magic-F1 transgenic mice displayed constitutive muscle hypertrophy. Here we describe microarray analysis on Magic-F1 myogenic progenitor cells showing an altered gene signatures on muscular hypertrophy and angiogenesis compared to wild-type cells. In addition, we performed a functional analysis on Magic-F1+/+ transgenic mice versus controls using treadmill test. We demonstrated that Magic-F1+/+ mice display an increase in muscle mass and cross-sectional area leading to an improvement in running performance. Moreover, the presence of MAGIC-F1 affected positively the vascular network, increasing the vessel number in fast twitch fibers. Finally, the gene expression profile analysis of Magic-F1+/+ satellite cells evidenced transcriptomic changes in genes involved in the control of muscle growth, development and vascularisation. We showed that MAGIC-F1-induced muscle hypertrophy affects positively vascular network, increasing vessel number in fast twitch fibers. This was due to unique features of mammalian skeletal muscle and its remarkable ability to adapt promptly to different physiological demands by modulating the gene expression profile in myogenic progenitors. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. Interconnections Between RNA-Processing Pathways Revealed by a Sequencing-Based Genetic Screen for Pre-mRNA Splicing Mutants in Fission Yeast.

    PubMed

    Larson, Amy; Fair, Benjamin Jung; Pleiss, Jeffrey A

    2016-06-01

    Pre-mRNA splicing is an essential component of eukaryotic gene expression and is highly conserved from unicellular yeasts to humans. Here, we present the development and implementation of a sequencing-based reverse genetic screen designed to identify nonessential genes that impact pre-mRNA splicing in the fission yeast Schizosaccharomyces pombe, an organism that shares many of the complex features of splicing in higher eukaryotes. Using a custom-designed barcoding scheme, we simultaneously queried ∼3000 mutant strains for their impact on the splicing efficiency of two endogenous pre-mRNAs. A total of 61 nonessential genes were identified whose deletions resulted in defects in pre-mRNA splicing; enriched among these were factors encoding known or predicted components of the spliceosome. Included among the candidates identified here are genes with well-characterized roles in other RNA-processing pathways, including heterochromatic silencing and 3' end processing. Splicing-sensitive microarrays confirm broad splicing defects for many of these factors, revealing novel functional connections between these pathways. Copyright © 2016 Larson et al.

  1. Interconnections Between RNA-Processing Pathways Revealed by a Sequencing-Based Genetic Screen for Pre-mRNA Splicing Mutants in Fission Yeast

    PubMed Central

    Larson, Amy; Fair, Benjamin Jung; Pleiss, Jeffrey A.

    2016-01-01

    Pre-mRNA splicing is an essential component of eukaryotic gene expression and is highly conserved from unicellular yeasts to humans. Here, we present the development and implementation of a sequencing-based reverse genetic screen designed to identify nonessential genes that impact pre-mRNA splicing in the fission yeast Schizosaccharomyces pombe, an organism that shares many of the complex features of splicing in higher eukaryotes. Using a custom-designed barcoding scheme, we simultaneously queried ∼3000 mutant strains for their impact on the splicing efficiency of two endogenous pre-mRNAs. A total of 61 nonessential genes were identified whose deletions resulted in defects in pre-mRNA splicing; enriched among these were factors encoding known or predicted components of the spliceosome. Included among the candidates identified here are genes with well-characterized roles in other RNA-processing pathways, including heterochromatic silencing and 3ʹ end processing. Splicing-sensitive microarrays confirm broad splicing defects for many of these factors, revealing novel functional connections between these pathways. PMID:27172183

  2. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  3. A Semantic Basis for Proof Queries and Transformations

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen W.; Luth, Christoph

    2013-01-01

    We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that

  4. Distinct tissue-specific transcriptional regulation revealed by gene regulatory networks in maize.

    PubMed

    Huang, Ji; Zheng, Juefei; Yuan, Hui; McGinnis, Karen

    2018-06-07

    Transcription factors (TFs) are proteins that can bind to DNA sequences and regulate gene expression. Many TFs are master regulators in cells that contribute to tissue-specific and cell-type-specific gene expression patterns in eukaryotes. Maize has been a model organism for over one hundred years, but little is known about its tissue-specific gene regulation through TFs. In this study, we used a network approach to elucidate gene regulatory networks (GRNs) in four tissues (leaf, root, SAM and seed) in maize. We utilized GENIE3, a machine-learning algorithm combined with large quantity of RNA-Seq expression data to construct four tissue-specific GRNs. Unlike some other techniques, this approach is not limited by high-quality Position Weighed Matrix (PWM), and can therefore predict GRNs for over 2000 TFs in maize. Although many TFs were expressed across multiple tissues, a multi-tiered analysis predicted tissue-specific regulatory functions for many transcription factors. Some well-studied TFs emerged within the four tissue-specific GRNs, and the GRN predictions matched expectations based upon published results for many of these examples. Our GRNs were also validated by ChIP-Seq datasets (KN1, FEA4 and O2). Key TFs were identified for each tissue and matched expectations for key regulators in each tissue, including GO enrichment and identity with known regulatory factors for that tissue. We also found functional modules in each network by clustering analysis with the MCL algorithm. By combining publicly available genome-wide expression data and network analysis, we can uncover GRNs at tissue-level resolution in maize. Since ChIP-Seq and PWMs are still limited in several model organisms, our study provides a uniform platform that can be adapted to any species with genome-wide expression data to construct GRNs. We also present a publicly available database, maize tissue-specific GRN (mGRN, https://www.bio.fsu.edu/mcginnislab/mgrn/ ), for easy querying. All source code and data are available at Github ( https://github.com/timedreamer/maize_tissue-specific_GRN ).

  5. Upregulation of suppressor of cytokine signaling 3 in microglia by cinnamic acid.

    PubMed

    Chakrabarti, Sudipta; Jana, Malabendu; Roy, Avik; Pahan, Kalipada

    2018-05-06

    Neuroinflammation plays an important role in the pathogenesis of various neurodegenerative diseases including Alzheimer's disease (AD). Suppressor of cytokine signaling 3 (SOCS3) is an anti-inflammatory molecule that suppresses cytokine signaling and inflammatory gene expression in different cells including microglia. However, pathways through which SOCS3 could be upregulated are poorly described. Cinnamic acid is a metabolite of cinnamon, a natural compound that is being widely used all over the world as a spice or flavoring agent. This study delineates the importance of cinnamic acid for the upregulation of SOCS3 in microglia. Cinnamic acid upregulated the expression of SOCS3 mRNA and protein in mouse BV-2 microglial cells in dose- and time-dependent manner. Accordingly, cinnamic acid also increased the level of SOCS3 and suppressed the expression of inducible nitric oxide synthase and proinflammatory cytokines (TNFα, IL-1β and IL-6) in LPS-stimulated BV-2 microglial cells. Similar to BV-2 microglial cells, cinnamic acid also increased the expression of SOCS3 in primary mouse microglia and astrocytes. Presence of cAMP response element in the promoter of socs3 gene, activation of cAMP response element binding (CREB) by cinnamic acid, abrogation of cinnamic acid-mediated upregulation of SOCS3 by siRNA knockdown of CREB, and the recruitment of CREB to the socs3 gene promoter by cinnamic acid suggest that cinnamic acid increases the expression of SOCS3 by CREB. These studies suggest that cinnamic acid upregulates SOCS3 via CREB pathway, which may be of importance in neuroinflammatory and neurodegenerative disorders. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  6. BJUT at TREC 2015 Microblog Track: Real-Time Filtering Using Non-negative Matrix Factorization

    DTIC Science & Technology

    2015-11-20

    information to extend the query, al- leviates the problem of concept drift in query expansion. In User profiles Twitter Google Bing accurate ambiguity...index as the query expansion document set; second- ly,put the interest file in twitter search energy to get back the relevant twetts, the interest in...for clustering is demonstrated in Figure 2. We will be the result of the search energy Twitter as the original expression of interest, the initial

  7. Extra Large G-Protein Interactome Reveals Multiple Stress Response Function and Partner-Dependent XLG Subcellular Localization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liang, Ying; Gao, Yajun; Jones, Alan M.

    The three-member family of Arabidopsis extra-large G proteins (XLG1-3) defines the prototype of an atypical Ga subunit in the heterotrimeric G protein complex. Some recent evidence indicate that XLG subunits operate along with its Gbg dimer in root morphology, stress responsiveness, and cytokinin induced development, however downstream targets of activated XLG proteins in the stress pathways are rarely known. In order to assemble a set of candidate XLG-targeted proteins, a yeast two-hybrid complementation-based screen was performed using XLG protein baits to query interactions between XLG and partner protein found in glucose-treated seedlings, roots, and Arabidopsis cells in culture. Seventy twomore » interactors were identified and >60% of a test set displayed in vivo interaction with XLG proteins. Gene co-expression analysis shows that >70% of the interactors are positively correlated with the corresponding XLG partners. Gene Ontology enrichment for all the candidates indicates stress responses and posits a molecular mechanism involving a specific set of transcription factor partners to XLG. Genes encoding two of these transcription factors, SZF1 and 2, require XLG proteins for full NaCl-induced expression. Furthermore, the subcellular localization of the XLG proteins in the nucleus, endosome, and plasma membrane is dependent on the specific interacting partner.« less

  8. Extra Large G-Protein Interactome Reveals Multiple Stress Response Function and Partner-Dependent XLG Subcellular Localization

    DOE PAGES

    Liang, Ying; Gao, Yajun; Jones, Alan M.

    2017-06-13

    The three-member family of Arabidopsis extra-large G proteins (XLG1-3) defines the prototype of an atypical Ga subunit in the heterotrimeric G protein complex. Some recent evidence indicate that XLG subunits operate along with its Gbg dimer in root morphology, stress responsiveness, and cytokinin induced development, however downstream targets of activated XLG proteins in the stress pathways are rarely known. In order to assemble a set of candidate XLG-targeted proteins, a yeast two-hybrid complementation-based screen was performed using XLG protein baits to query interactions between XLG and partner protein found in glucose-treated seedlings, roots, and Arabidopsis cells in culture. Seventy twomore » interactors were identified and >60% of a test set displayed in vivo interaction with XLG proteins. Gene co-expression analysis shows that >70% of the interactors are positively correlated with the corresponding XLG partners. Gene Ontology enrichment for all the candidates indicates stress responses and posits a molecular mechanism involving a specific set of transcription factor partners to XLG. Genes encoding two of these transcription factors, SZF1 and 2, require XLG proteins for full NaCl-induced expression. Furthermore, the subcellular localization of the XLG proteins in the nucleus, endosome, and plasma membrane is dependent on the specific interacting partner.« less

  9. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  10. Possible Mitochondria-Associated Enzymatic Role in Non-Hodgkin Lymphoma Residual Disease

    PubMed Central

    Kusao, Ian; Troelstrup, David; Shiramizu, Bruce

    2009-01-01

    Background The mechanisms responsible for resistant or recurrent disease in childhood non-Hodgkin lymphoma (NHL) are not yet fully understood. A unique mechanism suggesting the role of the mitochondria as the key energy source responsible for residual cells has been assessed in the clinical setting on specimens from patients on therapy were found to have increased copies of mitochondrial DNA (mtDNA) associated with positive minimal residual disease and/or persistent disease (MRD/PD) status. The potential role of mtDNA in MRD/PD emphasizes queries into the contributions of relevant enzymatic pathways responsible for MRD/PD. This study hypothesized that in an in-vitro model, recovering or residual cells from chemotoxicity will exhibit an increase in both citrate synthase and isocitrate dehydrogenase expression and decrease in succinate dehydrogenase expression. Procedure Ramos cells (Burkitt lymphoma cell line) were exposed to varying concentrations of doxorubicin and vincristine for 1 hr; and allowing for recovery in culture over a 7-day period. cDNA was extracted on days 1 and 7 of the cell culture period to assess the relative expression of the aforementioned genes. Results Increase citrate synthase, increase isocitrate dehydrogenase and decrease succinate dehydrogenase expressions were found in recovering Ramos cells. Conclusion Recovering lymphoma cells appear to compensate by regulating enzymatic levels of appropriate genes in the Krebs Cycle suggesting an important role of the mitochondria in the presence of residual cells. PMID:19936279

  11. An integrative data mining approach to identifying adverse outcome pathway signatures.

    PubMed

    Oki, Noffisat O; Edwards, Stephen W

    2016-03-28

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP network with the AHR gene, an interesting subnetwork including glaucoma was identified. While substantial literature exists to support the potential for AHR ligands to elicit glaucoma, it was not explicitly captured in the public annotation information in CTD. The subnetwork from this analysis suggests a cpAOP that includes changes in CYP1B1 expression, which has been previously established in the literature as a primary cause of glaucoma. These case studies highlight the value in integrating multiple data sources when defining cpAOPs for HTS data. Copyright © 2016. Published by Elsevier Ireland Ltd.

  12. L-Sulforaphane confers protection against oxidative stress in an in vitro model of age-related macular degeneration.

    PubMed

    Dulull, Nabeela Khadija; Dias, Daniel Anthony; Thrimawithana, Thilini Rasika; Kwa, Faith Ai Ai

    2018-01-25

    In age-related macular degeneration, oxidative damage and abnormal neovascularization in the retina are caused by the upregulation of vascular endothelium growth factor and reduced expression of Glutathione-S-transferase genes. Current treatments are only palliative. Compounds from cruciferous vegetables (e.g. L-Sulforaphane) have been found to restore normal gene expression levels in diseases including cancer via the activity of histone deacetylases and DNA methyltransferases, thus retarding disease progression. To examine L-Sulforaphane as a potential treatment to ameliorate aberrant levels of gene expression and metabolites observed in age-related macular degeneration. The in vitro oxidative stress model of AMD was based on the exposure of Adult Retinal Pigment Epithelium-19 cell line to 200µM hydrogen peroxide. The effects of L-Sulforaphane on cell proliferation were determined by MTS assay. The role of GSTM1, VEGFA, DNMT1 and HDAC6 genes in modulating these effects were investigated using quantitative real-time polymerase chain reaction. The metabolic profiling of L-Sulforaphane-treated cells via gas-chromatography mass-spectrometry was established. Significant differences between control and treatment groups were validated using one-way ANOVA, student t test and post-hoc Bonferroni statistical tests (p<0.05). L-Sulforaphane induced a dose-dependent increase in cell cell proliferation in the presence of hydrogen peroxide by upregulating Glutathione-S-Transferase µ1 gene expression. Metabolic profiling revealed that L-Sulforaphane increased levels of 2-monopalmitoglycerol, 9, 12, 15,-(Z-Z-Z)-Octodecatrienoic acid, 2-[Bis(trimethylsilyl)amino]ethyl bis(trimethylsilyl)-phosphate and nonanoic acid but decreased β-alanine levels in the absence or presence of hydrogen peroxide, respectively. This study supports the use of L-Sulforaphane to promote regeneration of retinal cells under oxidative stress conditions. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  13. Epigenetic Signals on Plant Adaptation: a Biotic Stress Perspective.

    PubMed

    Neto, Jose Ribamar Costa Ferreira; da Silva, Manasses Daniel; Pandolfi, Valesca; Crovella, Sergio; Benko-Iseppon, Ana Maria; Kido, Ederson Akio

    2017-01-01

    For sessile organisms such as plants, regulatory mechanisms of gene expression are vital, since they remain exposed to climatic and biological threats. Thus, they have to face hazards with instantaneous reorganization of their internal environment. For this purpose, besides the use of transcription factors, the participation of chromatin as an active factor in the regulation of transcription is crucial. Chemical changes in chromatin structure affect the accessibility of the transcriptional machinery and acting in signaling, engaging/inhibiting factors that participate in the transcription processes. Mechanisms in which gene expression undergoes changes without the occurrence of DNA gene mutations in the monomers that make up DNA, are understood as epigenetic phenomena. These include (1) post-translational modifications of histones, which results in stimulation or repression of gene activity and (2) cytosine methylation in the promoter region of individual genes, both preventing access of transcriptional activators as well as signaling the recruitment of repressors. There is evidence that such modifications can pass on to subsequent generations of daughter cells and even generations of individuals. However, reports indicate that they persist only in the presence of a stressor factor (or an inductor of the above-mentioned modifications). In its absence, these modifications weaken or lose heritability, being eliminated in the next few generations. In this review, it is argued how epigenetic signals influence gene regulation, the mechanisms involved and their participation in processes of resistance to biotic stresses, controlling processes of the plant immune system. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. BIG: a large-scale data integration tool for renal physiology

    PubMed Central

    Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya

    2016-01-01

    Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: “How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?” This is the type of problem that has motivated the “Big-Data” revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/. PMID:27279488

  15. Transcriptional Analysis of Fracture Healing and the Induction of Embryonic Stem Cell–Related Genes

    PubMed Central

    Bais, Manish; McLean, Jody; Sebastiani, Paola; Young, Megan; Wigner, Nathan; Smith, Temple; Kotton, Darrell N.; Einhorn, Thomas A.; Gerstenfeld, Louis C.

    2009-01-01

    Fractures are among the most common human traumas. Fracture healing represents a unique temporarily definable post-natal process in which to study the complex interactions of multiple molecular events that regulate endochondral skeletal tissue formation. Because of the regenerative nature of fracture healing, it is hypothesized that large numbers of post-natal stem cells are recruited and contribute to formation of the multiple cell lineages that contribute to this process. Bayesian modeling was used to generate the temporal profiles of the transcriptome during fracture healing. The temporal relationships between ontologies that are associated with various biologic, metabolic, and regulatory pathways were identified and related to developmental processes associated with skeletogenesis, vasculogenesis, and neurogenesis. The complement of all the expressed BMPs, Wnts, FGFs, and their receptors were related to the subsets of transcription factors that were concurrently expressed during fracture healing. We further defined during fracture healing the temporal patterns of expression for 174 of the 193 genes known to be associated with human genetic skeletal disorders. In order to identify the common regulatory features that might be present in stem cells that are recruited during fracture healing to other types of stem cells, we queried the transcriptome of fracture healing against that seen in embryonic stem cells (ESCs) and mesenchymal stem cells (MSCs). Approximately 300 known genes that are preferentially expressed in ESCs and ∼350 of the known genes that are preferentially expressed in MSCs showed induction during fracture healing. Nanog, one of the central epigenetic regulators associated with ESC stem cell maintenance, was shown to be associated in multiple forms or bone repair as well as MSC differentiation. In summary, these data present the first temporal analysis of the transcriptome of an endochondral bone formation process that takes place during fracture healing. They show that neurogenesis as well as vasculogenesis are predominant components of skeletal tissue formation and suggest common pathways are shared between post-natal stem cells and those seen in ESCs. PMID:19415118

  16. OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster.

    PubMed

    Miles, Alistair; Zhao, Jun; Klyne, Graham; White-Cooper, Helen; Shotton, David

    2010-10-01

    Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.

  17. miRTex: A Text Mining System for miRNA-Gene Relation Extraction

    PubMed Central

    Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.

    2015-01-01

    MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. PMID:26407127

  18. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  19. THE CONCEPT OF IDENTITY IN RACE RELATIONS--NOTES AND QUERIES.

    ERIC Educational Resources Information Center

    ERIKSON, ERIK H.

    DISCUSSED IN THIS SERIES OF "NOTES AND QUERIES" ARE VARIOUS ASPECTS OF IDENTITY, PARTICULARLY THE IDENTITY OF THE NEGRO. HELPFUL IN UNDERSTANDING NEGRO IDENTITY IS THE EXPRESSION BY NEGRO AUTHORS OF THEIR NEGATIVE IDENTITY (INVISIBILITY, NAMELESSNESS, AND FACELESSNESS), INTERPRETED HERE AS A DEMAND TO BE HEARD, SEEN, RECOGNIZED, AND FACED AS…

  20. QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks

    PubMed Central

    Thibodeau, Asa; Márquez, Eladio J.; Luo, Oscar; Ruan, Yijun; Shin, Dong-Guk; Stitzel, Michael L.; Ucar, Duygu

    2016-01-01

    Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/. PMID:27336171

  1. Recently Patented Viral Nucleotide Sequences and Generation of Virus-Derived Vaccines.

    PubMed

    Venkataraman, Srividhya; Ahmad, Tauqeer; Haidar, Mounir A; Hefferon, Kathleen L

    2017-01-01

    With an increase in comprehension of the molecular biology of viruses, there has been a recent surge in the application of virus sequences and viral gene expression strategies towards the diagnosis and treatment of diseases. The scope of the patenting landscape has widened as a result and the current review discusses patents pertaining to live / attenuated viral vaccines. The vaccines addressed here have been developed by both conventional means as well as by the state-of-the-art genetic engineering techniques. This review also addresses the applications of these patents for clinical and biotechnological purposes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. NCBI GEO: archive for functional genomics data sets—update

    PubMed Central

    Barrett, Tanya; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L.; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra

    2013-01-01

    The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data. PMID:23193258

  3. HbVar: A relational database of human hemoglobin variants and thalassemia mutations at the globin gene server.

    PubMed

    Hardison, Ross C; Chui, David H K; Giardine, Belinda; Riemer, Cathy; Patrinos, George P; Anagnou, Nicholas; Miller, Webb; Wajcman, Henri

    2002-03-01

    We have constructed a relational database of hemoglobin variants and thalassemia mutations, called HbVar, which can be accessed on the web at http://globin.cse.psu.edu. Extensive information is recorded for each variant and mutation, including a description of the variant and associated pathology, hematology, electrophoretic mobility, methods of isolation, stability information, ethnic occurrence, structure studies, functional studies, and references. The initial information was derived from books by Dr. Titus Huisman and colleagues [Huisman et al., 1996, 1997, 1998]. The current database is updated regularly with the addition of new data and corrections to previous data. Queries can be formulated based on fields in the database. Tables of common categories of variants, such as all those involving the alpha1-globin gene (HBA1) or all those that result in high oxygen affinity, are maintained by automated queries on the database. Users can formulate more precise queries, such as identifying "all beta-globin variants associated with instability and found in Scottish populations." This new database should be useful for clinical diagnosis as well as in fundamental studies of hemoglobin biochemistry, globin gene regulation, and human sequence variation at these loci. Copyright 2002 Wiley-Liss, Inc.

  4. PlantTribes: a gene and gene family resource for comparative genomics in plants

    PubMed Central

    Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

    2008-01-01

    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study. PMID:18073194

  5. Genome-wide identification and analysis of biotic and abiotic stress regulation of small heat shock protein (HSP20) family genes in bread wheat.

    PubMed

    Muthusamy, Senthilkumar K; Dalal, Monika; Chinnusamy, Viswanathan; Bansal, Kailash C

    2017-04-01

    Small Heat Shock Proteins (sHSPs)/HSP20 are molecular chaperones that protect plants by preventing protein aggregation during abiotic stress conditions, especially heat stress. Due to global climate change, high temperature is emerging as a major threat to wheat productivity. Thus, the identification of HSP20 and analysis of HSP transcriptional regulation under different abiotic stresses in wheat would help in understanding the role of these proteins in abiotic stress tolerance. We used sequences of known rice and Arabidopsis HSP20 HMM profiles as queries against publicly available wheat genome and wheat full length cDNA databases (TriFLDB) to identify the respective orthologues from wheat. 163 TaHSP20 (including 109 sHSP and 54 ACD) genes were identified and classified according to the sub-cellular localization and phylogenetic relationship with sequenced grass genomes (Oryza sativa, Sorghum bicolor, Zea mays, Brachypodium distachyon and Setaria italica). Spatio-temporal, biotic and abiotic stress-specific expression patterns in normalized RNA seq and wheat array datasets revealed constitutive as well as inductive responses of HSP20 in different tissues and developmental stages of wheat. Promoter analysis of TaHSP20 genes showed the presence of tissue-specific, biotic, abiotic, light-responsive, circadian and cell cycle-responsive cis-regulatory elements. 14 TaHSP20 family genes were under the regulation of 8 TamiRNA genes. The expression levels of twelve HSP20 genes were studied under abiotic stress conditions in the drought- and heat-tolerant wheat genotype C306. Of the 13 TaHSP20 genes, TaHSP16.9H-CI showed high constitutive expression with upregulation only under salt stress. Both heat and salt stresses upregulated the expression of TaHSP17.4-CI, TaHSP17.7A-CI, TaHSP19.1-CIII, TaACD20.0B-CII and TaACD20.6C-CIV, while TaHSP23.7-MTI was specifically induced only under heat stress. Our results showed that the identified TaHSP20 genes play an important role under different abiotic stress conditions. Thus, the results illustrate the complexity of the TaHSP20 gene family and its stress regulation in wheat, and suggest that sHSPs as attractive breeding targets for improvement of the heat tolerance of wheat. Copyright © 2017 Elsevier GmbH. All rights reserved.

  6. Differential display of abundantly expressed genes of Trichoderma harzianum during colonization of tomato-germinating seeds and roots.

    PubMed

    Mehrabi-Koushki, Mehdi; Rouhani, Hamid; Mahdikhani-Moghaddam, Esmat

    2012-11-01

    The identification of Trichoderma genes whose expression is altered during early stages of interaction with developing roots of germinated seeds is an important step toward understanding the rhizosphere competency of Trichoderma spp. The potential of 13 Trichoderma strains to colonize tomato root and promote plant growth has been evaluated. All used strains successfully propagated in spermosphere and continued their growth in rhizoplane simultaneously root enlargement while the strains T6 and T7 were the most abundant in the apical segment of roots. Root colonization in most strains associated with promoting the roots and shoots growth while they significantly increased up to 43 and 40 % roots and shoots dry weights, respectively. Differential display reverse transcriptase-PCR (DDRT-PCR) has been developed to detect differentially expressed genes in the previously selected strain, Trichoderma harzianum T7, during colonization stages of tomato-germinating seeds and roots. Amplified DDRT-PCR products were analyzed on gel agarose and 62 differential bands excised, purified, cloned, and sequenced. Obtained ESTs were submit-queried to NCBI database by BLASTx search and gene ontology hierarchy. Most of transcripts (29 EST) corresponds to known and hypothetical proteins such as secretion-related small GTPase, 40S ribosomal protein S3a, 3-hydroxybutyryl-CoA dehydrogenase, DNA repair protein rad50, lipid phosphate phosphatase-related protein type 3, nuclear essential protein, phospholipase A2, fatty acid desaturase, nuclear pore complex subunit Nup133, ubiquitin-activating enzyme, and 60S ribosomal protein L40. Also, 13 of these sequences showed no homology (E > 0.05) with public databases and considered as novel genes. Some of these ESTs corresponded to genes encodes enzymes potentially involved in nutritional support of microorganisms which have obvious importance in the establishment of Trichoderma in spermosphere and rhizosphere, via potentially functioning in acquisition of nutrients from energy-rich carbon compounds leaked from the germinating seeds and roots.

  7. Impact of constitutional copy number variants on biological pathway evolution.

    PubMed

    Poptsova, Maria; Banerjee, Samprit; Gokcumen, Omer; Rubin, Mark A; Demichelis, Francesca

    2013-01-23

    Inherited Copy Number Variants (CNVs) can modulate the expression levels of individual genes. However, little is known about how CNVs alter biological pathways and how this varies across different populations. To trace potential evolutionary changes of well-described biological pathways, we jointly queried the genomes and the transcriptomes of a collection of individuals with Caucasian, Asian or Yoruban descent combining high-resolution array and sequencing data. We implemented an enrichment analysis of pathways accounting for CNVs and genes sizes and detected significant enrichment not only in signal transduction and extracellular biological processes, but also in metabolism pathways. Upon the estimation of CNV population differentiation (CNVs with different polymorphism frequencies across populations), we evaluated that 22% of the pathways contain at least one gene that is proximal to a CNV (CNV-gene pair) that shows significant population differentiation. The majority of these CNV-gene pairs belong to signal transduction pathways and 6% of the CNV-gene pairs show statistical association between the copy number states and the transcript levels. The analysis suggested possible examples of positive selection within individual populations including NF-kB, MAPK signaling pathways, and Alu/L1 retrotransposition factors. Altogether, our results suggest that constitutional CNVs may modulate subtle pathway changes through specific pathway enzymes, which may become fixed in some populations.

  8. Impact of constitutional copy number variants on biological pathway evolution

    PubMed Central

    2013-01-01

    Background Inherited Copy Number Variants (CNVs) can modulate the expression levels of individual genes. However, little is known about how CNVs alter biological pathways and how this varies across different populations. To trace potential evolutionary changes of well-described biological pathways, we jointly queried the genomes and the transcriptomes of a collection of individuals with Caucasian, Asian or Yoruban descent combining high-resolution array and sequencing data. Results We implemented an enrichment analysis of pathways accounting for CNVs and genes sizes and detected significant enrichment not only in signal transduction and extracellular biological processes, but also in metabolism pathways. Upon the estimation of CNV population differentiation (CNVs with different polymorphism frequencies across populations), we evaluated that 22% of the pathways contain at least one gene that is proximal to a CNV (CNV-gene pair) that shows significant population differentiation. The majority of these CNV-gene pairs belong to signal transduction pathways and 6% of the CNV-gene pairs show statistical association between the copy number states and the transcript levels. Conclusions The analysis suggested possible examples of positive selection within individual populations including NF-kB, MAPK signaling pathways, and Alu/L1 retrotransposition factors. Altogether, our results suggest that constitutional CNVs may modulate subtle pathway changes through specific pathway enzymes, which may become fixed in some populations. PMID:23342974

  9. Transcript Profile of Flowering Regulatory Genes in VcFT-Overexpressing Blueberry Plants

    PubMed Central

    Walworth, Aaron E.; Chai, Benli; Song, Guo-qing

    2016-01-01

    In order to identify genetic components in flowering pathways of highbush blueberry (Vaccinium corymbosum L.), a transcriptome reference composed of 254,396 transcripts and 179,853 gene contigs was developed by assembly of 72.7 million reads using Trinity. Using this transcriptome reference and a query of flowering pathway genes of herbaceous plants, we identified potential flowering pathway genes/transcripts of blueberry. Transcriptome analysis of flowering pathway genes was then conducted on leaf tissue samples of transgenic blueberry cv. Aurora (‘VcFT-Aurora’), which overexpresses a blueberry FLOWERING LOCUS T-like gene (VcFT). Sixty-one blueberry transcripts of 40 genes showed high similarities to 33 known flowering-related genes of herbaceous plants, of which 17 down-regulated and 16 up-regulated genes were identified in ‘VcFT-Aurora’. All down-regulated genes encoded transcription factors/enzymes upstream in the signaling pathway containing VcFT. A blueberry CONSTANS-LIKE 5-like (VcCOL5) gene was down-regulated and associated with five other differentially expressed (DE) genes in the photoperiod-mediated flowering pathway. Three down-regulated genes, i.e., a MADS-AFFECTING FLOWERING 2-like gene (VcMAF2), a MADS-AFFECTING FLOWERING 5-like gene (VcMAF5), and a VERNALIZATION1-like gene (VcVRN1), may function as integrators in place of FLOWERING LOCUS C (FLC) in the vernalization pathway. Because no CONSTAN1-like or FLOWERING LOCUS C-like genes were found in blueberry, VcCOL5 and VcMAF2/VcMAF5 or VRN1 might be the major integrator(s) in the photoperiod- and vernalization-mediated flowering pathway, respectively. The major down-stream genes of VcFT, i.e., SUPPRESSOR of Overexpression of Constans 1-like (VcSOC1), LEAFY-like (VcLFY), APETALA1-like (VcAP1), CAULIFLOWER 1-like (VcCAL1), and FRUITFULL-like (VcFUL) genes were present and showed high similarity to their orthologues in herbaceous plants. Moreover, overexpression of VcFT promoted expression of all of these VcFT downstream genes. These results suggest that VcFT’s down-stream genes appear conserved in blueberry. PMID:27271296

  10. Transcript Profile of Flowering Regulatory Genes in VcFT-Overexpressing Blueberry Plants.

    PubMed

    Walworth, Aaron E; Chai, Benli; Song, Guo-Qing

    2016-01-01

    In order to identify genetic components in flowering pathways of highbush blueberry (Vaccinium corymbosum L.), a transcriptome reference composed of 254,396 transcripts and 179,853 gene contigs was developed by assembly of 72.7 million reads using Trinity. Using this transcriptome reference and a query of flowering pathway genes of herbaceous plants, we identified potential flowering pathway genes/transcripts of blueberry. Transcriptome analysis of flowering pathway genes was then conducted on leaf tissue samples of transgenic blueberry cv. Aurora ('VcFT-Aurora'), which overexpresses a blueberry FLOWERING LOCUS T-like gene (VcFT). Sixty-one blueberry transcripts of 40 genes showed high similarities to 33 known flowering-related genes of herbaceous plants, of which 17 down-regulated and 16 up-regulated genes were identified in 'VcFT-Aurora'. All down-regulated genes encoded transcription factors/enzymes upstream in the signaling pathway containing VcFT. A blueberry CONSTANS-LIKE 5-like (VcCOL5) gene was down-regulated and associated with five other differentially expressed (DE) genes in the photoperiod-mediated flowering pathway. Three down-regulated genes, i.e., a MADS-AFFECTING FLOWERING 2-like gene (VcMAF2), a MADS-AFFECTING FLOWERING 5-like gene (VcMAF5), and a VERNALIZATION1-like gene (VcVRN1), may function as integrators in place of FLOWERING LOCUS C (FLC) in the vernalization pathway. Because no CONSTAN1-like or FLOWERING LOCUS C-like genes were found in blueberry, VcCOL5 and VcMAF2/VcMAF5 or VRN1 might be the major integrator(s) in the photoperiod- and vernalization-mediated flowering pathway, respectively. The major down-stream genes of VcFT, i.e., SUPPRESSOR of Overexpression of Constans 1-like (VcSOC1), LEAFY-like (VcLFY), APETALA1-like (VcAP1), CAULIFLOWER 1-like (VcCAL1), and FRUITFULL-like (VcFUL) genes were present and showed high similarity to their orthologues in herbaceous plants. Moreover, overexpression of VcFT promoted expression of all of these VcFT downstream genes. These results suggest that VcFT's down-stream genes appear conserved in blueberry.

  11. The Plant Structure Ontology, a Unified Vocabulary of Anatomy and Morphology of a Flowering Plant1[W][OA

    PubMed Central

    Ilic, Katica; Kellogg, Elizabeth A.; Jaiswal, Pankaj; Zapata, Felipe; Stevens, Peter F.; Vincent, Leszek P.; Avraham, Shulamit; Reiser, Leonore; Pujar, Anuradha; Sachs, Martin M.; Whitman, Noah T.; McCouch, Susan R.; Schaeffer, Mary L.; Ware, Doreen H.; Stein, Lincoln D.; Rhee, Seung Y.

    2007-01-01

    Formal description of plant phenotypes and standardized annotation of gene expression and protein localization data require uniform terminology that accurately describes plant anatomy and morphology. This facilitates cross species comparative studies and quantitative comparison of phenotypes and expression patterns. A major drawback is variable terminology that is used to describe plant anatomy and morphology in publications and genomic databases for different species. The same terms are sometimes applied to different plant structures in different taxonomic groups. Conversely, similar structures are named by their species-specific terms. To address this problem, we created the Plant Structure Ontology (PSO), the first generic ontological representation of anatomy and morphology of a flowering plant. The PSO is intended for a broad plant research community, including bench scientists, curators in genomic databases, and bioinformaticians. The initial releases of the PSO integrated existing ontologies for Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and rice (Oryza sativa); more recent versions of the ontology encompass terms relevant to Fabaceae, Solanaceae, additional cereal crops, and poplar (Populus spp.). Databases such as The Arabidopsis Information Resource, Nottingham Arabidopsis Stock Centre, Gramene, MaizeGDB, and SOL Genomics Network are using the PSO to describe expression patterns of genes and phenotypes of mutants and natural variants and are regularly contributing new annotations to the Plant Ontology database. The PSO is also used in specialized public databases, such as BRENDA, GENEVESTIGATOR, NASCArrays, and others. Over 10,000 gene annotations and phenotype descriptions from participating databases can be queried and retrieved using the Plant Ontology browser. The PSO, as well as contributed gene associations, can be obtained at www.plantontology.org. PMID:17142475

  12. ESTminer: a Web interface for mining EST contig and cluster databases.

    PubMed

    Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R

    2005-03-01

    ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.

  13. ImmunemiR - A Database of Prioritized Immune miRNA Disease Associations and its Interactome.

    PubMed

    Prabahar, Archana; Natarajan, Jeyakumar

    2017-01-01

    MicroRNAs are the key regulators of gene expression and their abnormal expression in the immune system may be associated with several human diseases such as inflammation, cancer and autoimmune diseases. Elucidation of miRNA disease association through the interactome will deepen the understanding of its disease mechanisms. A specialized database for immune miRNAs is highly desirable to demonstrate the immune miRNA disease associations in the interactome. miRNAs specific to immune related diseases were retrieved from curated databases such as HMDD, miR2disease and PubMed literature based on MeSH classification of immune system diseases. The additional data such as miRNA target genes, genes coding protein-protein interaction information were compiled from related resources. Further, miRNAs were prioritized to specific immune diseases using random walk ranking algorithm. In total 245 immune miRNAs associated with 92 OMIM disease categories were identified from external databases. The resultant data were compiled as ImmunemiR, a database of prioritized immune miRNA disease associations. This database provides both text based annotation information and network visualization of its interactome. To our knowledge, ImmunemiR is the first available database to provide a comprehensive repository of human immune disease associated miRNAs with network visualization options of its target genes, protein-protein interactions (PPI) and its disease associations. It is freely available at http://www.biominingbu.org/immunemir/. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Reduced Mitochondrial Activity is Early and Steady in the Entorhinal Cortex but it is Mainly Unmodified in the Frontal Cortex in Alzheimer's Disease.

    PubMed

    Armand-Ugon, Mercedes; Ansoleaga, Belen; Berjaoui, Sara; Ferrer, Isidro

    2017-01-01

    It is well established that mitochondrial damage plays a role in the pathophysiology of Alzheimer's disease (AD). However, studies carried out in humans barely contemplate regional differences with disease progression. To study the expression of selected nuclear genes encoding subunits of the mitochondrial complexes and the activity of mitochondrial complexes in AD, in two regions: the entorhinal cortex (EC) and frontal cortex area 8 (FC). Frozen samples from 148 cases processed for gene expression by qRT-PCR and determination of individual activities of mitochondrial complexes I, II, IV and V using commercial kits and home-made assays. Decreased expression of NDUFA2, NDUFB3, UQCR11, COX7C, ATPD, ATP5L and ATP50, covering subunits of complex I, II, IV and V, occurs in total homogenates of the EC in AD stages V-VI when compared with stages I-II. However reduced activity of complexes I, II and V of isolated mitochondria occurs as early as stages I-II when compared with middle-aged individuals in the EC. In contrast, no alterations in the expression of the same genes and no alterations in the activity of mitochondrial complexes are found in the FC in the same series. Different mechanisms of impaired energy metabolism may occur in AD, one of them, represented by the EC, is the result of primary and early alteration of mitochondria; the other one is probably the result, at least in part, of decreased functional input and is represented by hypometabolism in the FC in AD patients aged 86 or younger. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Probabilistic drug connectivity mapping

    PubMed Central

    2014-01-01

    Background The aim of connectivity mapping is to match drugs using drug-treatment gene expression profiles from multiple cell lines. This can be viewed as an information retrieval task, with the goal of finding the most relevant profiles for a given query drug. We infer the relevance for retrieval by data-driven probabilistic modeling of the drug responses, resulting in probabilistic connectivity mapping, and further consider the available cell lines as different data sources. We use a special type of probabilistic model to separate what is shared and specific between the sources, in contrast to earlier connectivity mapping methods that have intentionally aggregated all available data, neglecting information about the differences between the cell lines. Results We show that the probabilistic multi-source connectivity mapping method is superior to alternatives in finding functionally and chemically similar drugs from the Connectivity Map data set. We also demonstrate that an extension of the method is capable of retrieving combinations of drugs that match different relevant parts of the query drug response profile. Conclusions The probabilistic modeling-based connectivity mapping method provides a promising alternative to earlier methods. Principled integration of data from different cell lines helps to identify relevant responses for specific drug repositioning applications. PMID:24742351

  16. Struct2Net: a web service to predict protein–protein interactions using a structure-based approach

    PubMed Central

    Singh, Rohit; Park, Daniel; Xu, Jinbo; Hosur, Raghavendra; Berger, Bonnie

    2010-01-01

    Struct2Net is a web server for predicting interactions between arbitrary protein pairs using a structure-based approach. Prediction of protein–protein interactions (PPIs) is a central area of interest and successful prediction would provide leads for experiments and drug design; however, the experimental coverage of the PPI interactome remains inadequate. We believe that Struct2Net is the first community-wide resource to provide structure-based PPI predictions that go beyond homology modeling. Also, most web-resources for predicting PPIs currently rely on functional genomic data (e.g. GO annotation, gene expression, cellular localization, etc.). Our structure-based approach is independent of such methods and only requires the sequence information of the proteins being queried. The web service allows multiple querying options, aimed at maximizing flexibility. For the most commonly studied organisms (fly, human and yeast), predictions have been pre-computed and can be retrieved almost instantaneously. For proteins from other species, users have the option of getting a quick-but-approximate result (using orthology over pre-computed results) or having a full-blown computation performed. The web service is freely available at http://struct2net.csail.mit.edu. PMID:20513650

  17. miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

    PubMed Central

    Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.

    2005-01-01

    A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938

  18. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  19. CEBS object model for systems biology data, SysBio-OM.

    PubMed

    Xirasagar, Sandhya; Gustafson, Scott; Merrick, B Alex; Tomer, Kenneth B; Stasiewicz, Stanley; Chan, Denny D; Yost, Kenneth J; Yates, John R; Sumner, Susan; Xiao, Nianqing; Waters, Michael D

    2004-09-01

    To promote a systems biology approach to understanding the biological effects of environmental stressors, the Chemical Effects in Biological Systems (CEBS) knowledge base is being developed to house data from multiple complex data streams in a systems friendly manner that will accommodate extensive querying from users. Unified data representation via a single object model will greatly aid in integrating data storage and management, and facilitate reuse of software to analyze and display data resulting from diverse differential expression or differential profile technologies. Data streams include, but are not limited to, gene expression analysis (transcriptomics), protein expression and protein-protein interaction analysis (proteomics) and changes in low molecular weight metabolite levels (metabolomics). To enable the integration of microarray gene expression, proteomics and metabolomics data in the CEBS system, we designed an object model, Systems Biology Object Model (SysBio-OM). The model is comprehensive and leverages other open source efforts, namely the MicroArray Gene Expression Object Model (MAGE-OM) and the Proteomics Experiment Data Repository (PEDRo) object model. SysBio-OM is designed by extending MAGE-OM to represent protein expression data elements (including those from PEDRo), protein-protein interaction and metabolomics data. SysBio-OM promotes the standardization of data representation and data quality by facilitating the capture of the minimum annotation required for an experiment. Such standardization refines the accuracy of data mining and interpretation. The open source SysBio-OM model, which can be implemented on varied computing platforms is presented here. A universal modeling language depiction of the entire SysBio-OM is available at http://cebs.niehs.nih.gov/SysBioOM/. The Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. The database and interface are being built to implement the model and will be available for public use at http://cebs.niehs.nih.gov.

  20. A Querying Method over RDF-ized Health Level Seven v2.5 Messages Using Life Science Knowledge Resources.

    PubMed

    Kawazoe, Yoshimasa; Imai, Takeshi; Ohe, Kazuhiko

    2016-04-05

    Health level seven version 2.5 (HL7 v2.5) is a widespread messaging standard for information exchange between clinical information systems. By applying Semantic Web technologies for handling HL7 v2.5 messages, it is possible to integrate large-scale clinical data with life science knowledge resources. Showing feasibility of a querying method over large-scale resource description framework (RDF)-ized HL7 v2.5 messages using publicly available drug databases. We developed a method to convert HL7 v2.5 messages into the RDF. We also converted five kinds of drug databases into RDF and provided explicit links between the corresponding items among them. With those linked drug data, we then developed a method for query expansion to search the clinical data using semantic information on drug classes along with four types of temporal patterns. For evaluation purpose, medication orders and laboratory test results for a 3-year period at the University of Tokyo Hospital were used, and the query execution times were measured. Approximately 650 million RDF triples for medication orders and 790 million RDF triples for laboratory test results were converted. Taking three types of query in use cases for detecting adverse events of drugs as an example, we confirmed these queries were represented in SPARQL Protocol and RDF Query Language (SPARQL) using our methods and comparison with conventional query expressions were performed. The measurement results confirm that the query time is feasible and increases logarithmically or linearly with the amount of data and without diverging. The proposed methods enabled query expressions that separate knowledge resources and clinical data, thereby suggesting the feasibility for improving the usability of clinical data by enhancing the knowledge resources. We also demonstrate that when HL7 v2.5 messages are automatically converted into RDF, searches are still possible through SPARQL without modifying the structure. As such, the proposed method benefits not only our hospitals, but also numerous hospitals that handle HL7 v2.5 messages. Our approach highlights a potential of large-scale data federation techniques to retrieve clinical information, which could be applied as applications of clinical intelligence to improve clinical practices, such as adverse drug event monitoring and cohort selection for a clinical study as well as discovering new knowledge from clinical information.

  1. L1000CDS2: LINCS L1000 characteristic direction signatures search engine

    PubMed Central

    Duan, Qiaonan; Reid, St Patrick; Clark, Neil R; Wang, Zichen; Fernandez, Nicolas F; Rouillard, Andrew D; Readhead, Ben; Tritsch, Sarah R; Hodos, Rachel; Hafner, Marc; Niepel, Mario; Sorger, Peter K; Dudley, Joel T; Bavari, Sina; Panchal, Rekha G; Ma’ayan, Avi

    2016-01-01

    The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS2. The L1000CDS2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS2, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource. PMID:28413689

  2. Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

    NASA Technical Reports Server (NTRS)

    Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

    1986-01-01

    The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.

  3. NCBI2RDF: enabling full RDF-based access to NCBI databases.

    PubMed

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Wu, Kesheng

    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less

  5. Chlorhexidine Induces VanA-Type Vancomycin Resistance Genes in Enterococci

    PubMed Central

    Bhardwaj, Pooja; Ziegler, Elizabeth

    2016-01-01

    Chlorhexidine is a bisbiguanide antiseptic used for infection control. Vancomycin-resistant E. faecium (VREfm) is among the leading causes of hospital-acquired infections. VREfm may be exposed to chlorhexidine at supra- and subinhibitory concentrations as a result of chlorhexidine bathing and chlorhexidine-impregnated central venous catheter use. We used RNA sequencing to investigate how VREfm responds to chlorhexidine gluconate exposure. Among the 35 genes upregulated ≥10-fold after 15 min of exposure to the MIC of chlorhexidine gluconate were those encoding VanA-type vancomycin resistance (vanHAX) and those associated with reduced daptomycin susceptibility (liaXYZ). We confirmed that vanA upregulation was not strain or species specific by querying other VanA-type VRE. VanB-type genes were not induced. The vanH promoter was found to be responsive to subinhibitory chlorhexidine gluconate in VREfm, as was production of the VanX protein. Using vanH reporter experiments with Bacillus subtilis and deletion analysis in VREfm, we found that this phenomenon is VanR dependent. Deletion of vanR did not result in increased chlorhexidine susceptibility, demonstrating that vanHAX induction is not protective against chlorhexidine. As expected, VanA-type VRE is more susceptible to ceftriaxone in the presence of sub-MIC chlorhexidine. Unexpectedly, VREfm is also more susceptible to vancomycin in the presence of subinhibitory chlorhexidine, suggesting that chlorhexidine-induced gene expression changes lead to additional alterations in cell wall synthesis. We conclude that chlorhexidine induces expression of VanA-type vancomycin resistance genes and genes associated with daptomycin nonsusceptibility. Overall, our results indicate that the impacts of subinhibitory chlorhexidine exposure on hospital-associated pathogens should be further investigated in laboratory studies. PMID:26810654

  6. Identification and Characterization of 30 K Protein Genes Found in Bombyx mori (Lepidoptera: Bombycidae) Transcriptome

    PubMed Central

    Shi, Xiao-Feng; Li, Yi-Nü; Yi, Yong-Zhu; Xiao, Xing-Guo; Zhang, Zhi-Fang

    2015-01-01

    The 30 K proteins, the major group of hemolymph proteins in the silkworm, Bombyx mori (Lepidoptera: Bombycidae), are structurally related with molecular masses of ∼30 kDa and are involved in various physiological processes, e.g., energy storage, embryonic development, and immune responses. For this report, known 30 K protein gene sequences were used as Blastn queries against sequences in the B. mori transcriptome (SilkTransDB). Twenty-nine cDNAs (Bm30K-1–29) were retrieved, including four being previously unidentified in the Lipoprotein_11 family. The genomic structures of the 29 genes were analyzed and they were mapped to their corresponding chromosomes. Furthermore, phylogenetic analysis revealed that the 29 genes encode three types of 30 K proteins. The members increased in each type is mainly a result of gene duplication with the appearance of each type preceding the differentiation of each species included in the tree. Real-Time Quantitative Polymerase Chain Reaction (Q-PCR) confirmed that the genes could be expressed, and that the three types have different temporal expression patterns. Proteins from the hemolymph was separated by SDS-PAGE, and those with molecular mass of ∼30 kDa were isolated and identified by mass spectrometry sequencing in combination with searches of various databases containing B. mori 30K protein sequences. Of the 34 proteins identified, 13 are members of the 30 K protein family, with one that had not been found in the SilkTransDB, although it had been found in the B. mori genome. Taken together, our results indicate that the 30 K protein family contains many members with various functions. Other methods will be required to find more members of the family. PMID:26078299

  7. VAAPA: a web platform for visualization and analysis of alternative polyadenylation.

    PubMed

    Guan, Jinting; Fu, Jingyi; Wu, Mingcheng; Chen, Longteng; Ji, Guoli; Quinn Li, Qingshun; Wu, Xiaohui

    2015-02-01

    Polyadenylation [poly(A)] is an essential process during the maturation of most mRNAs in eukaryotes. Alternative polyadenylation (APA) as an important layer of gene expression regulation has been increasingly recognized in various species. Here, a web platform for visualization and analysis of alternative polyadenylation (VAAPA) was developed. This platform can visualize the distribution of poly(A) sites and poly(A) clusters of a gene or a section of a chromosome. It can also highlight genes with switched APA sites among different conditions. VAAPA is an easy-to-use web-based tool that provides functions of poly(A) site query, data uploading, downloading, and APA sites visualization. It was designed in a multi-tier architecture and developed based on Smart GWT (Google Web Toolkit) using Java as the development language. VAAPA will be a valuable addition to the community for the comprehensive study of APA, not only by making the high quality poly(A) site data more accessible, but also by providing users with numerous valuable functions for poly(A) site analysis and visualization. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. psygenet2r: a R/Bioconductor package for the analysis of psychiatric disease genes.

    PubMed

    Gutiérrez-Sacristán, Alba; Hernández-Ferrer, Carles; González, Juan R; Furlong, Laura I

    2017-12-15

    Psychiatric disorders have a great impact on morbidity and mortality. Genotype-phenotype resources for psychiatric diseases are key to enable the translation of research findings to a better care of patients. PsyGeNET is a knowledge resource on psychiatric diseases and their genes, developed by text mining and curated by domain experts. We present psygenet2r, an R package that contains a variety of functions for leveraging PsyGeNET database and facilitating its analysis and interpretation. The package offers different types of queries to the database along with variety of analysis and visualization tools, including the study of the anatomical structures in which the genes are expressed and gaining insight of gene's molecular function. Psygenet2r is especially suited for network medicine analysis of psychiatric disorders. The package is implemented in R and is available under MIT license from Bioconductor (http://bioconductor.org/packages/release/bioc/html/psygenet2r.html). juanr.gonzalez@isglobal.org or laura.furlong@upf.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. Building a biomedical semantic network in Wikipedia with Semantic Wiki Links

    PubMed Central

    Good, Benjamin M.; Clarke, Erik L.; Loguercio, Salvatore; Su, Andrew I.

    2012-01-01

    Wikipedia is increasingly used as a platform for collaborative data curation, but its current technical implementation has significant limitations that hinder its use in biocuration applications. Specifically, while editors can easily link between two articles in Wikipedia to indicate a relationship, there is no way to indicate the nature of that relationship in a way that is computationally accessible to the system or to external developers. For example, in addition to noting a relationship between a gene and a disease, it would be useful to differentiate the cases where genetic mutation or altered expression causes the disease. Here, we introduce a straightforward method that allows Wikipedia editors to embed computable semantic relations directly in the context of current Wikipedia articles. In addition, we demonstrate two novel applications enabled by the presence of these new relationships. The first is a dynamically generated information box that can be rendered on all semantically enhanced Wikipedia articles. The second is a prototype gene annotation system that draws its content from the gene-centric articles on Wikipedia and exposes the new semantic relationships to enable previously impossible, user-defined queries. Database URL: http://en.wikipedia.org/wiki/Portal:Gene_Wiki PMID:22434829

  10. Building a biomedical semantic network in Wikipedia with Semantic Wiki Links.

    PubMed

    Good, Benjamin M; Clarke, Erik L; Loguercio, Salvatore; Su, Andrew I

    2012-01-01

    Wikipedia is increasingly used as a platform for collaborative data curation, but its current technical implementation has significant limitations that hinder its use in biocuration applications. Specifically, while editors can easily link between two articles in Wikipedia to indicate a relationship, there is no way to indicate the nature of that relationship in a way that is computationally accessible to the system or to external developers. For example, in addition to noting a relationship between a gene and a disease, it would be useful to differentiate the cases where genetic mutation or altered expression causes the disease. Here, we introduce a straightforward method that allows Wikipedia editors to embed computable semantic relations directly in the context of current Wikipedia articles. In addition, we demonstrate two novel applications enabled by the presence of these new relationships. The first is a dynamically generated information box that can be rendered on all semantically enhanced Wikipedia articles. The second is a prototype gene annotation system that draws its content from the gene-centric articles on Wikipedia and exposes the new semantic relationships to enable previously impossible, user-defined queries. DATABASE URL: http://en.wikipedia.org/wiki/Portal:Gene_Wiki.

  11. Development of MTL-CEBPA: Small Activating RNA Drug for Hepatocellular Carcinoma.

    PubMed

    Setten, Ryan L; Lightfoot, Helen L; Habib, Nagy A; Rossi, John J

    2018-06-10

    Oligonucleotide drug development has revolutionised the drug discovery field allowing the notoriously "undruggable" genome to potentially become "druggable". Within this field, 'small' or 'short' activating RNAs (saRNA) are a more recently discovered category of short double stranded RNA with clinical potential. SaRNAs promote endogenous transcription from target loci, a phenomenon widely observed in mammals known as RNA activation (RNAa). The ability to target a particular gene is dependent on the sequence of the saRNA. Hence, the potential clinical application of saRNA is to increase target gene expression in a sequence specific manner. SaRNA based oligonucleotide therapeutics present great promise in expanding the "druggable" genome with particular areas of interest including transcription factor activation and haploinsufficency. Review and Conclusion: In this mini-review, we describe the pre-clinical development of the first saRNA drug to enter the clinic. This saRNA, referred to as MTL-CEBPA, induces transcription of the transcription factor CCAAT/enhancer-binding protein alpha (CEBPα), a tumour suppressor and critical regulator of hepatocyte function. MTL-CEBPA is presently in Phase I clinical trials for hepatocellular carcinoma (HCC). The clinical development of MTL-CEBPA will demonstrate "proof of concept", showing that saRNAs can provide the basis for drugs which enhance targeted gene expression and consequently improve disease outcome in patients. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Differentiated transcriptional signatures in the maize landraces of Chiapas, Mexico.

    PubMed

    Kost, Matthew A; Perales, Hugo R; Wijeratne, Saranga; Wijeratne, Asela J; Stockinger, Eric; Mercer, Kristin L

    2017-09-08

    Landrace farmers are the keepers of crops locally adapted to the environments where they are cultivated. Patterns of diversity across the genome can provide signals of past evolution in the face of abiotic and biotic change. Understanding this rich genetic resource is imperative especially since diversity can provide agricultural security as climate continues to shift. Here we employ RNA sequencing (RNA-seq) to understand the role that conditions that vary across a landscape may have played in shaping genetic diversity in the maize landraces of Chiapas, Mexico. We collected landraces from three distinct elevational zones and planted them in a midland common garden. Early season leaf tissue was collected for RNA-seq and we performed weighted gene co-expression network analysis (WGCNA). We then used association analysis between landrace co-expression module expression values and environmental parameters of landrace origin to elucidate genes and gene networks potentially shaped by environmental factors along our study gradient. Elevation of landrace origin affected the transcriptome profiles. Two co-expression modules were highly correlated with temperature parameters of landrace origin and queries into their 'hub' genes suggested that temperature may have led to differentiation among landraces in hormone biosynthesis/signaling and abiotic and biotic stress responses. We identified several 'hub' transcription factors and kinases as candidates for the regulation of these responses. These findings indicate that natural selection may influence the transcriptomes of crop landraces along an elevational gradient in a major diversity center, and provide a foundation for exploring the genetic basis of local adaptation. While we cannot rule out the role of neutral evolutionary forces in the patterns we have identified, combining whole transcriptome sequencing technologies, established bioinformatics techniques, and common garden experimentation can powerfully elucidate structure of adaptive diversity across a varied landscape. Ultimately, gaining such understanding can facilitate the conservation and strategic utilization of crop genetic diversity in a time of climate change.

  13. KF-finder: identification of key factors from host-microbial networks in cervical cancer.

    PubMed

    Hu, Jialu; Gao, Yiqun; Zheng, Yan; Shang, Xuequn

    2018-04-24

    The human body is colonized by a vast number of microbes. Microbiota can benefit many normal life processes, but can also cause many diseases by interfering the regular metabolism and immune system. Recent studies have demonstrated that the microbial community is closely associated with various types of cell carcinoma. The search for key factors, which also refer to cancer causing agents, can provide an important clue in understanding the regulatory mechanism of microbiota in uterine cervix cancer. In this paper, we investigated microbiota composition and gene expression data for 58 squamous and adenosquamous cell carcinoma. A host-microbial covariance network was constructed based on the 16s rRNA and gene expression data of the samples, which consists of 259 abundant microbes and 738 differentially expressed genes (DEGs). To search for risk factors from host-microbial networks, the method of bi-partite betweenness centrality (BpBC) was used to measure the risk of a given node to a certain biological process in hosts. A web-based tool KF-finder was developed, which can efficiently query and visualize the knowledge of microbiota and differentially expressed genes (DEGs) in the network. Our results suggest that prevotellaceade, tissierellaceae and fusobacteriaceae are the most abundant microbes in cervical carcinoma, and the microbial community in cervical cancer is less diverse than that of any other boy sites in health. A set of key risk factors anaerococcus, hydrogenophilaceae, eubacterium, PSMB10, KCNIP1 and KRT13 have been identified, which are thought to be involved in the regulation of viral response, cell cycle and epithelial cell differentiation in cervical cancer. It can be concluded that permanent changes of microbiota composition could be a major force for chromosomal instability, which subsequently enables the effect of key risk factors in cancer. All our results described in this paper can be freely accessed from our website at http://www.nwpu-bioinformatics.com/KF-finder/ .

  14. An Examination of Natural Language as a Query Formation Tool for Retrieving Information on E-Health from Pub Med.

    ERIC Educational Resources Information Center

    Peterson, Gabriel M.; Su, Kuichun; Ries, James E.; Sievert, Mary Ellen C.

    2002-01-01

    Discussion of Internet use for information searches on health-related topics focuses on a study that examined complexity and variability of natural language in using search terms that express the concept of electronic health (e-health). Highlights include precision of retrieved information; shift in terminology; and queries using the Pub Med…

  15. RNA-seq analysis and de novo transcriptome assembly of Jerusalem artichoke (Helianthus tuberosus Linne).

    PubMed

    Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun

    2014-01-01

    Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.

  16. Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman

    2014-01-01

    Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380

  17. Characterization of the OmyY1 region on the rainbow trout Y chromosome

    USGS Publications Warehouse

    Phillips, Ruth B.; DeKoning, Jenefer J.; Brunelli, Joseph P.; Faber-Hammond, Joshua J.; Hansen, John D.; Christensen, Kris A.; Renn, Suzy C.P.; Thorgaard, Gary H.

    2013-01-01

    We characterized the male-specific region on the Y chromosome of rainbow trout, which contains both sdY (the sex-determining gene) and the male-specific genetic marker, OmyY1. Several clones containing the OmyY1 marker were screened from a BAC library from a YY clonal line and found to be part of an 800 kb BAC contig. Using fluorescence in situ hybridization (FISH), these clones were localized to the end of the short arm of the Y chromosome in rainbow trout, with an additional signal on the end of the X chromosome in many cells. We sequenced a minimum tiling path of these clones using Illumina and 454 pyrosequencing. The region is rich in transposons and rDNA, but also appears to contain several single-copy protein-coding genes. Most of these genes are also found on the X chromosome; and in several cases sex-specific SNPs in these genes were identified between the male (YY) and female (XX) homozygous clonal lines. Additional genes were identified by hybridization of the BACs to the cGRASP salmonid 4x44K oligo microarray. By BLASTn evaluations using hypothetical transcripts of OmyY1-linked candidate genes as query against several EST databases, we conclude at least 12 of these candidate genes are likely functional, and expressed.

  18. Bladder inflammatory transcriptome in response to tachykinins: Neurokinin 1 receptor-dependent genes and transcription regulatory elements

    PubMed Central

    Saban, Ricardo; Simpson, Cindy; Vadigepalli, Rajanikanth; Memet, Sylvie; Dozmorov, Igor; Saban, Marcia R

    2007-01-01

    Background Tachykinins (TK), such as substance P, and their neurokinin receptors which are ubiquitously expressed in the human urinary tract, represent an endogenous system regulating bladder inflammatory, immune responses, and visceral hypersensitivity. Increasing evidence correlates alterations in the TK system with urinary tract diseases such as neurogenic bladders, outflow obstruction, idiopathic detrusor instability, and interstitial cystitis. However, despite promising effects in animal models, there seems to be no published clinical study showing that NK-receptor antagonists are an effective treatment of pain in general or urinary tract disorders, such as detrusor overactivity. In order to search for therapeutic targets that could block the tachykinin system, we set forth to determine the regulatory network downstream of NK1 receptor activation. First, NK1R-dependent transcripts were determined and used to query known databases for their respective transcription regulatory elements (TREs). Methods An expression analysis was performed using urinary bladders isolated from sensitized wild type (WT) and NK1R-/- mice that were stimulated with saline, LPS, or antigen to provoke inflammation. Based on cDNA array results, NK1R-dependent genes were selected. PAINT software was used to query TRANSFAC database and to retrieve upstream TREs that were confirmed by electrophoretic mobility shift assays. Results The regulatory network of TREs driving NK1R-dependent genes presented cRel in a central position driving 22% of all genes, followed by AP-1, NF-kappaB, v-Myb, CRE-BP1/c-Jun, USF, Pax-6, Efr-1, Egr-3, and AREB6. A comparison between NK1R-dependent and NK1R-independent genes revealed Nkx-2.5 as a unique discriminator. In the presence of NK1R, Nkx2-5 _01 was significantly correlated with 36 transcripts which included several candidates for mediating bladder development (FGF) and inflammation (PAR-3, IL-1R, IL-6, α-NGF, TSP2). In the absence of NK1R, the matrix Nkx2-5_02 had a predominant participation driving 8 transcripts, which includes those involved in cancer (EYA1, Trail, HSF1, and ELK-1), smooth-to-skeletal muscle trans-differentiation, and Z01, a tight-junction protein, expression. Electrophoretic mobility shift assays confirmed that, in the mouse urinary bladder, activation of NK1R by substance P (SP) induces both NKx-2.5 and NF-kappaB translocations. Conclusion This is the first report describing a role for Nkx2.5 in the urinary tract. As Nkx2.5 is the unique discriminator of NK1R-modulated inflammation, it can be imagined that in the near future, new based therapies selective for controlling Nkx2.5 activity in the urinary tract may be used in the treatment in a number of bladder disorders. PMID:17519035

  19. Analysis of Gene Regulatory Networks of Maize in Response to Nitrogen.

    PubMed

    Jiang, Lu; Ball, Graham; Hodgman, Charlie; Coules, Anne; Zhao, Han; Lu, Chungui

    2018-03-08

    Nitrogen (N) fertilizer has a major influence on the yield and quality. Understanding and optimising the response of crop plants to nitrogen fertilizer usage is of central importance in enhancing food security and agricultural sustainability. In this study, the analysis of gene regulatory networks reveals multiple genes and biological processes in response to N. Two microarray studies have been used to infer components of the nitrogen-response network. Since they used different array technologies, a map linking the two probe sets to the maize B73 reference genome has been generated to allow comparison. Putative Arabidopsis homologues of maize genes were used to query the Biological General Repository for Interaction Datasets (BioGRID) network, which yielded the potential involvement of three transcription factors (TFs) (GLK5, MADS64 and bZIP108) and a Calcium-dependent protein kinase. An Artificial Neural Network was used to identify influential genes and retrieved bZIP108 and WRKY36 as significant TFs in both microarray studies, along with genes for Asparagine Synthetase, a dual-specific protein kinase and a protein phosphatase. The output from one study also suggested roles for microRNA (miRNA) 399b and Nin-like Protein 15 (NLP15). Co-expression-network analysis of TFs with closely related profiles to known Nitrate-responsive genes identified GLK5, GLK8 and NLP15 as candidate regulators of genes repressed under low Nitrogen conditions, while bZIP108 might play a role in gene activation.

  20. Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor

    PubMed Central

    Denny, Joshua C.; Miller, Randolph A.; Waitman, Lemuel Russell; Arrieta, Mark; Peterson, Joshua F.

    2009-01-01

    Objective Typically detected via electrocardiograms (ECGs), QT interval prolongation is a known risk factor for sudden cardiac death. Since medications can promote or exacerbate the condition, detection of QT interval prolongation is important for clinical decision support. We investigated the accuracy of natural language processing (NLP) for identifying QT prolongation from cardiologist-generated, free-text ECG impressions compared to corrected QT (QTc) thresholds reported by ECG machines. Methods After integrating negation detection to a locally-developed natural language processor, the KnowledgeMap concept identifier, we evaluated NLP-based detection of QT prolongation compared to the calculated QTc on a set of 44,318 ECGs obtained from hospitalized patients. We also created a string query using regular expressions to identify QT prolongation. We calculated sensitivity and specificity of the methods using manual physician review of the cardiologist-generated reports as the gold standard. To investigate causes of “false positive” calculated QTc, we manually reviewed randomly selected ECGs with a long calculated QTc but no mention of QT prolongation. Separately, we validated the performance of the negation detection algorithm on 5,000 manually-categorized ECG phrases for any medical concept (not limited to QT prolongation) prior to developing the NLP query for QT prolongation. Results The NLP query for QT prolongation correctly identified 2,364 of 2,373 ECGs with QT prolongation with a sensitivity of 0.996 and a positive predictive value of 1.000. There were no false positives. The regular expression query had a sensitivity of 0.999 and positive predictive value of 0.982. In contrast, the positive predictive value of common QTc thresholds derived from ECG machines was 0.07–0.25 with corresponding sensitivities of 0.994–0.046. The negation detection algorithm had a recall of 0.973 and precision of 0.982 for 10,490 concepts found within ECG impressions. Conclusions NLP and regular expression queries of cardiologists’ ECG interpretations can more effectively identify QT prolongation than the automated QTc intervals reported by ECG machines. Future clinical decision support could employ NLP queries to detect QTc prolongation and other reported ECG abnormalities. PMID:18938105

  1. Foundations of RDF Databases

    NASA Astrophysics Data System (ADS)

    Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge

    The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.

  2. RNA-Seq reveals complex genetic response to Deepwater Horizon oil release in Fundulus grandis.

    PubMed

    Garcia, Tzintzuni I; Shen, Yingjia; Crawford, Douglas; Oleksiak, Marjorie F; Whitehead, Andrew; Walter, Ronald B

    2012-09-12

    The release of oil resulting from the blowout of the Deepwater Horizon (DH) drilling platform was one of the largest in history discharging more than 189 million gallons of oil and subject to widespread application of oil dispersants. This event impacted a wide range of ecological habitats with a complex mix of pollutants whose biological impact is still not yet fully understood. To better understand the effects on a vertebrate genome, we studied gene expression in the salt marsh minnow Fundulus grandis, which is local to the northern coast of the Gulf of Mexico and is a sister species of the ecotoxicological model Fundulus heteroclitus. To assess genomic changes, we quantified mRNA expression using high throughput sequencing technologies (RNA-Seq) in F. grandis populations in the marshes and estuaries impacted by DH oil release. This application of RNA-Seq to a non-model, wild, and ecologically significant organism is an important evaluation of the technology to quickly assess similar events in the future. Our de novo assembly of RNA-Seq data produced a large set of sequences which included many duplicates and fragments. In many cases several of these could be associated with a common reference sequence using blast to query a reference database. This reduced the set of significant genes to 1,070 down-regulated and 1,251 up-regulated genes. These genes indicate a broad and complex genomic response to DH oil exposure including the expected AHR-mediated response and CYP genes. In addition a response to hypoxic conditions and an immune response are also indicated. Several genes in the choriogenin family were down-regulated in the exposed group; a response that is consistent with AH exposure. These analyses are in agreement with oligonucleotide-based microarray analyses, and describe only a subset of significant genes with aberrant regulation in the exposed set. RNA-Seq may be successfully applied to feral and extremely polymorphic organisms that do not have an underlying genome sequence assembly to address timely environmental problems. Additionally, the observed changes in a large set of transcript expression levels are indicative of a complex response to the varied petroleum components to which the fish were exposed.

  3. HTSeq--a Python framework to work with high-throughput sequencing data.

    PubMed

    Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang

    2015-01-15

    A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.

  4. User centered and ontology based information retrieval system for life sciences.

    PubMed

    Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent

    2012-01-25

    Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.

  5. User centered and ontology based information retrieval system for life sciences

    PubMed Central

    2012-01-01

    Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375

  6. NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

    PubMed Central

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425

  7. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database

    PubMed Central

    Drabkin, Harold J.; Blake, Judith A.

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as ‘GO’ or ‘homology’ or ‘phenotype’. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as ‘papers selected for GO that refer to genes with NO GO annotation’. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications. PMID:23110975

  8. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications.

  9. CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks

    PubMed Central

    Baumbach, Jan

    2007-01-01

    Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative contradictions or further gene regulatory interactions. Furthermore, it integrates protein clusters by means of heuristically solving the weighted graph cluster editing problem. In addition, it provides Web Service based access to up to date gene annotation data from GenDB. Conclusion The release 4.0 of CoryneRegNet is a comprehensive system for the integrated analysis of procaryotic gene regulatory networks. It is a versatile systems biology platform to support the efficient and large-scale analysis of transcriptional regulation of gene expression in microorganisms. It is publicly available at . PMID:17986320

  10. The contribution of morphological knowledge to French MeSH mapping for information retrieval.

    PubMed Central

    Zweigenbaum, P.; Darmoni, S. J.; Grabar, N.

    2001-01-01

    MeSH-indexed Internet health directories must provide a mapping from natural language queries to MeSH terms so that both health professionals and the general public can query their contents. We describe here the design of lexical knowledge bases for mapping French expressions to MeSH terms, and the initial evaluation of their contribution to Doc'CISMeF, the search tool of a MeSH-indexed directory of French-language medical Internet resources. The observed trend is in favor of the use of morphological knowledge as a moderate (approximately 5%) but effective factor for improving query to term mapping capabilities. PMID:11825295

  11. CoReCG: a comprehensive database of genes associated with colon-rectal cancer

    PubMed Central

    Agarwal, Rahul; Kumar, Binayak; Jayadev, Msk; Raghav, Dhwani; Singh, Ashutosh

    2016-01-01

    Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like genes, mutations, expression changes, epigenetic alteration or post translation modification, genetic heterogeneity. Earlier findings were mostly focused on understanding etiology of colorectal carcinogenesis but less emphasis were given for the comprehensive review of the existing findings of individual studies which can provide better diagnostics based on the suggested markers in discrete studies. Colon Rectal Cancer Gene Database (CoReCG), contains 2056 colon-rectal cancer genes information involved in distinct colorectal cancer stages sourced from published literature with an effective knowledge based information retrieval system. Additionally, interactive web interface enriched with various browsing sections, augmented with advance search facility for querying the database is provided for user friendly browsing, online tools for sequence similarity searches and knowledge based schema ensures a researcher friendly information retrieval mechanism. Colorectal cancer gene database (CoReCG) is expected to be a single point source for identification of colorectal cancer-related genes, thereby helping with the improvement of classification, diagnosis and treatment of human cancers. Database URL: lms.snu.edu.in/corecg PMID:27114494

  12. MIRNA-DISTILLER: A Stand-Alone Application to Compile microRNA Data from Databases.

    PubMed

    Rieger, Jessica K; Bodan, Denis A; Zanger, Ulrich M

    2011-01-01

    MicroRNAs (miRNA) are small non-coding RNA molecules of ∼22 nucleotides which regulate large numbers of genes by binding to seed sequences at the 3'-untranslated region of target gene transcripts. The target mRNA is then usually degraded or translation is inhibited, although thus resulting in posttranscriptional down regulation of gene expression at the mRNA and/or protein level. Due to the bioinformatic difficulties in predicting functional miRNA binding sites, several publically available databases have been developed that predict miRNA binding sites based on different algorithms. The parallel use of different databases is currently indispensable, but highly uncomfortable and time consuming, especially when working with numerous genes of interest. We have therefore developed a new stand-alone program, termed MIRNA-DISTILLER, which allows to compile miRNA data for given target genes from public databases. Currently implemented are TargetScan, microCosm, and miRDB, which may be queried independently, pairwise, or together to calculate the respective intersections. Data are stored locally for application of further analysis tools including freely definable biological parameter filters, customized output-lists for both miRNAs and target genes, and various graphical facilities. The software, a data example file and a tutorial are freely available at http://www.ikp-stuttgart.de/content/language1/html/10415.asp.

  13. MIRNA-DISTILLER: A Stand-Alone Application to Compile microRNA Data from Databases

    PubMed Central

    Rieger, Jessica K.; Bodan, Denis A.; Zanger, Ulrich M.

    2011-01-01

    MicroRNAs (miRNA) are small non-coding RNA molecules of ∼22 nucleotides which regulate large numbers of genes by binding to seed sequences at the 3′-untranslated region of target gene transcripts. The target mRNA is then usually degraded or translation is inhibited, although thus resulting in posttranscriptional down regulation of gene expression at the mRNA and/or protein level. Due to the bioinformatic difficulties in predicting functional miRNA binding sites, several publically available databases have been developed that predict miRNA binding sites based on different algorithms. The parallel use of different databases is currently indispensable, but highly uncomfortable and time consuming, especially when working with numerous genes of interest. We have therefore developed a new stand-alone program, termed MIRNA-DISTILLER, which allows to compile miRNA data for given target genes from public databases. Currently implemented are TargetScan, microCosm, and miRDB, which may be queried independently, pairwise, or together to calculate the respective intersections. Data are stored locally for application of further analysis tools including freely definable biological parameter filters, customized output-lists for both miRNAs and target genes, and various graphical facilities. The software, a data example file and a tutorial are freely available at http://www.ikp-stuttgart.de/content/language1/html/10415.asp PMID:22303335

  14. Gonadal Identity in the Absence of Pro-Testis Factor SOX9 and Pro-Ovary Factor Beta-Catenin in Mice1

    PubMed Central

    Nicol, Barbara; Yao, Humphrey H.-C.

    2015-01-01

    Sex-reversal cases in humans and genetic models in mice have revealed that the fate of the bipotential gonad hinges upon the balance between pro-testis SOX9 and pro-ovary beta-catenin pathways. Our central query was: if SOX9 and beta-catenin define the gonad's identity, then what do the gonads become when both factors are absent? To answer this question, we developed mouse models that lack either Sox9, beta-catenin, or both in the somatic cells of the fetal gonads and examined the morphological outcomes and transcriptome profiles. In the absence of Sox9 and beta-catenin, both XX and XY gonads progressively lean toward the testis fate, indicating that expression of certain pro-testis genes requires the repression of the beta-catenin pathway, rather than a direct activation by SOX9. We also observed that XY double knockout gonads were more masculinized than their XX counterpart. To identify the genes responsible for the initial events of masculinization and to determine how the genetic context (XX vs. XY) affects this process, we compared the transcriptomes of Sox9/beta-catenin mutant gonads and found that early molecular changes underlying the XY-specific masculinization involve the expression of Sry and 21 SRY direct target genes, such as Sox8 and Cyp26b1. These results imply that when both Sox9 and beta-catenin are absent, Sry is capable of activating other pro-testis genes and drive testis differentiation. Our findings not only provide insight into the mechanism of sex determination, but also identify candidate genes that are potentially involved in disorders of sex development. PMID:26108792

  15. PepPat, a pattern-based oligopeptide homology search method and the identification of a novel tachykinin-like peptide.

    PubMed

    Jiang, Ying; Gao, Ge; Fang, Gang; Gustafson, Eric L; Laverty, Maureen; Yin, Yanbin; Zhang, Yong; Luo, Jingchu; Greene, Jonathan R; Bayne, Marvin L; Hedrick, Joseph A; Murgolo, Nicholas J

    2003-05-01

    PepPat, a hybrid method that combines pattern matching with similarity scoring, is described. We also report PepPat's application in the identification of a novel tachykinin-like peptide. PepPat takes as input a query peptide and a user-specified regular expression pattern within the peptide. It first performs a database pattern match and then ranks candidates on the basis of their similarity to the query peptide. PepPat calculates similarity over the pattern spanning region, enhancing PepPat's sensitivity for short query peptides. PepPat can also search for a user-specified number of occurrences of a repeated pattern within the target sequence. We illustrate PepPat's application in short peptide ligand mining. As a validation example, we report the identification of a novel tachykinin-like peptide, C14TKL-1, and show it is an NK1 (neuokinin receptor 1) agonist whose message is widely expressed in human periphery. PepPat is offered online at: http://peppat.cbi.pku.edu.cn.

  16. Entity Bases: Large-Scale Knowledgebases for Intelligence Data

    DTIC Science & Technology

    2009-02-01

    declaratively expressed as Datalog rules . The EntityBase supports two query scenarios: • Free-Form Querying: A human analyst or a client program can pose...integration, Prometheus follows the Inverse Rules algo- rithm (Duschka 1997) with additional optimizations (Thakkar et al. 2005). We use the mediator...Discovery and Data Mining (PAKDD󈧈), Sydney, Australia. Crammer , K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. (2006). Online passive

  17. An Improvement to a Multi-Client Searchable Encryption Scheme for Boolean Queries.

    PubMed

    Jiang, Han; Li, Xue; Xu, Qiuliang

    2016-12-01

    The migration of e-health systems to the cloud computing brings huge benefits, as same as some security risks. Searchable Encryption(SE) is a cryptography encryption scheme that can protect the confidentiality of data and utilize the encrypted data at the same time. The SE scheme proposed by Cash et al. in Crypto2013 and its follow-up work in CCS2013 are most practical SE Scheme that support Boolean queries at present. In their scheme, the data user has to generate the search tokens by the counter number one by one and interact with server repeatedly, until he meets the correct one, or goes through plenty of tokens to illustrate that there is no search result. In this paper, we make an improvement to their scheme. We allow server to send back some information and help the user to generate exact search token in the search phase. In our scheme, there are only two round interaction between server and user, and the search token has [Formula: see text] elements, where n is the keywords number in query expression, and [Formula: see text] is the minimum documents number that contains one of keyword in query expression, and the computation cost of server is [Formula: see text] modular exponentiation operation.

  18. BμG@Sbase—a microbial gene expression and comparative genomic database

    PubMed Central

    Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792

  19. BμG@Sbase--a microbial gene expression and comparative genomic database.

    PubMed

    Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.

  20. Diabetic Neuropathy: Update on Pathophysiological Mechanism and the Possible Involvement of Glutamate Pathways.

    PubMed

    Hussain, Nadia; Adrian, Thomas E

    2017-01-01

    Diabetic neuropathy is a common complication of diabetes. It adversely affects the lives of most diabetics. It is the leading cause of non-traumatic limb amputation. Diabetic autonomic neuropathy can target any system and increases morbidity and mortality. Treatment begins with adequate glycemic control but despite this, many patients go on to develop neuropathy which suggests there are additional and unidentified, as yet, pathological mechanisms in place. Although several theories exist, the exact mechanisms are not yet established. Disease modifying treatment requires a more complete understanding of the mechanisms of disease. Pathways Involved: This review discusses the potential pathological mechanisms of diabetic neuropathy, including the polyol pathway, hexosamine pathway, protein kinase C, advanced glycation end product formation, polyADP ribose polymerase, and the role of oxidative stress, inflammation, growth factors and lipid abnormalities. Finally it focuses on how possible changes in glutamate signaling pathways fit into the current theories. Insights into the mechanisms involving gene expression in diabetic neuropathy can help pinpoint genes with altered expression. This will help in the development of novel alternative therapeutic strategies to significantly slow the progression of neuropathy in susceptible individuals and perhaps even prevention. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. TET1-mediated hypomethylation activates oncogenic signaling in triple-negative breast cancer.

    PubMed

    Good, Charly Ryan; Panjarian, Shoghag; Kelly, Andrew D; Madzo, Jozef; Patel, Bela; Jelinek, Jaroslav; Issa, Jean-Pierre J

    2018-06-11

    Both gains and losses of DNA methylation are common in cancer, but the factors controlling this balance of methylation remain unclear. Triple-negative breast cancer (TNBC), a subtype that does not overexpress hormone receptors or HER2/NEU, is one of the most hypomethylated cancers observed. Here we discovered that the TET1 DNA demethylase is specifically overexpressed in about 40% of patients with TNBC, where it is associated with hypomethylation of up to 10% of queried CpG sites and a worse overall survival. Through bioinformatic analyses in both breast and ovarian cancer cell line panels, we uncovered an intricate network connecting TET1 to hypomethylation and activation of cancer-specific oncogenic pathways including PI3K, EGFR, and PDGF. TET1 expression correlated with sensitivity to drugs targeting the PI3K-mTOR pathway, and CRISPR-mediated deletion of TET1 in two independent TNBC cell lines resulted in reduced expression of PI3K pathway genes, upregulation of immune response genes, and substantially reduced cellular proliferation, suggesting dependence of oncogenic pathways on TET1 overexpression. Our work establishes TET1 as a potential oncogene that contributes to aberrant hypomethylation in cancer and suggests that TET1 could serve as a druggable target for therapeutic intervention. Copyright ©2018, American Association for Cancer Research.

  2. Transcriptomic analysis of grain amaranth (Amaranthus hypochondriacus) using 454 pyrosequencing: comparison with A. tuberculatus, expression profiling in stems and in response to biotic and abiotic stress

    PubMed Central

    2011-01-01

    Background Amaranthus hypochondriacus, a grain amaranth, is a C4 plant noted by its ability to tolerate stressful conditions and produce highly nutritious seeds. These possess an optimal amino acid balance and constitute a rich source of health-promoting peptides. Although several recent studies, mostly involving subtractive hybridization strategies, have contributed to increase the relatively low number of grain amaranth expressed sequence tags (ESTs), transcriptomic information of this species remains limited, particularly regarding tissue-specific and biotic stress-related genes. Thus, a large scale transcriptome analysis was performed to generate stem- and (a)biotic stress-responsive gene expression profiles in grain amaranth. Results A total of 2,700,168 raw reads were obtained from six 454 pyrosequencing runs, which were assembled into 21,207 high quality sequences (20,408 isotigs + 799 contigs). The average sequence length was 1,064 bp and 930 bp for isotigs and contigs, respectively. Only 5,113 singletons were recovered after quality control. Contigs/isotigs were further incorporated into 15,667 isogroups. All unique sequences were queried against the nr, TAIR, UniRef100, UniRef50 and Amaranthaceae EST databases for annotation. Functional GO annotation was performed with all contigs/isotigs that produced significant hits with the TAIR database. Only 8,260 sequences were found to be homologous when the transcriptomes of A. tuberculatus and A. hypochondriacus were compared, most of which were associated with basic house-keeping processes. Digital expression analysis identified 1,971 differentially expressed genes in response to at least one of four stress treatments tested. These included several multiple-stress-inducible genes that could represent potential candidates for use in the engineering of stress-resistant plants. The transcriptomic data generated from pigmented stems shared similarity with findings reported in developing stems of Arabidopsis and black cottonwood (Populus trichocarpa). Conclusions This study represents the first large-scale transcriptomic analysis of A. hypochondriacus, considered to be a highly nutritious and stress-tolerant crop. Numerous genes were found to be induced in response to (a)biotic stress, many of which could further the understanding of the mechanisms that contribute to multiple stress-resistance in plants, a trait that has potential biotechnological applications in agriculture. PMID:21752295

  3. Antiproliferative and Apoptotic Effect of Dendrosomal Curcumin Nanoformulation in P53 Mutant and Wide-Type Cancer Cell Lines.

    PubMed

    Montazeri, Maryam; Pilehvar-Soltanahmadi, Younes; Mohaghegh, Mina; Panahi, Alireza; Khodi, Samaneh; Zarghami, Nosratollah; Sadeghizadeh, Majid

    2017-01-01

    The aim of this paper is to investigate the effect of dendrosomal curcumin (DNC) on the expression of p53 in both p53 mutant cell lines SKBR3/SW480 and p53 wild-type MCF7/HCT116 in both RNA and protein levels. Curcumin, derived from Curcumin longa, is recently considered in cancer related researches for its cell growth inhibition properties. p53 is a common tumor-suppressor gene involved in cancers and its mutation not only inhibits tumor suppressor activity but also promotes oncogenic activity. Here, p53 mutant/Wild-type cells were employed to study the toxicity of DNC using MTT assay, Flow cytometry and Annexin-V, Real-time PCR and Western blot were used to analyze p53, BAX, Bcl-2, p21 and Noxa changes after treatment. During the time, DNC increased the SubG1 cells and decreased G1, S and G2/M cells, early apoptosis also indicated the inhibition of cell growth in early phase. Real-Time PCR assay showed an increased mRNA of BAX, Noxa and p21 during the time with decreased Bcl-2. The expression of p53 mutant decreased in SKBR3/SW480, and the expression of p53 wild-type increased in MCF7/HCT116. Consequently, p53 plays an important role in mediating the survival by DNC, which can prevent tumor cell growth by modulating the expression of genes involved in apoptosis and proliferation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  5. Connection Map for Compounds (CMC): A Server for Combinatorial Drug Toxicity and Efficacy Analysis.

    PubMed

    Liu, Lei; Tsompana, Maria; Wang, Yong; Wu, Dingfeng; Zhu, Lixin; Zhu, Ruixin

    2016-09-26

    Drug discovery and development is a costly and time-consuming process with a high risk for failure resulting primarily from a drug's associated clinical safety and efficacy potential. Identifying and eliminating inapt candidate drugs as early as possible is an effective way for reducing unnecessary costs, but limited analytical tools are currently available for this purpose. Recent growth in the area of toxicogenomics and pharmacogenomics has provided with a vast amount of drug expression microarray data. Web servers such as CMap and LTMap have used this information to evaluate drug toxicity and mechanisms of action independently; however, their wider applicability has been limited by the lack of a combinatorial drug-safety type of analysis. Using available genome-wide drug transcriptional expression profiles, we developed the first web server for combinatorial evaluation of toxicity and efficacy of candidate drugs named "Connection Map for Compounds" (CMC). Using CMC, researchers can initially compare their query drug gene signatures with prebuilt gene profiles generated from two large-scale toxicogenomics databases, and subsequently perform a drug efficacy analysis for identification of known mechanisms of drug action or generation of new predictions. CMC provides a novel approach for drug repositioning and early evaluation in drug discovery with its unique combination of toxicity and efficacy analyses, expansibility of data and algorithms, and customization of reference gene profiles. CMC can be freely accessed at http://cadd.tongji.edu.cn/webserver/CMCbp.jsp .

  6. Code query by example

    NASA Astrophysics Data System (ADS)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  7. The Innate Immune Database (IIDB)

    PubMed Central

    Korb, Martin; Rust, Aistair G; Thorsson, Vesteinn; Battail, Christophe; Li, Bin; Hwang, Daehee; Kennedy, Kathleen A; Roach, Jared C; Rosenberger, Carrie M; Gilchrist, Mark; Zak, Daniel; Johnson, Carrie; Marzolf, Bruz; Aderem, Alan; Shmulevich, Ilya; Bolouri, Hamid

    2008-01-01

    Background As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site . Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens. Description We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can be interrogated via a web interface. Genomic annotations and binding site predictions can be automatically viewed with a customized version of the Argo genome browser. Conclusion We present the Innate Immune Database (IIDB) as a community resource for immunologists interested in gene regulatory systems underlying innate responses to pathogens. The database website can be freely accessed at . PMID:18321385

  8. Microarray Analyses Reveal Marked Differences in Growth Factor and Receptor Expression Between 8-Cell Human Embryos and Pluripotent Stem Cells

    PubMed Central

    Vlismas, Antonis; Bletsa, Ritsa; Mavrogianni, Despina; Mamali, Georgina; Pergamali, Maria; Dinopoulou, Vasiliki; Partsinevelos, George; Drakakis, Peter; Loutradis, Dimitris

    2016-01-01

    Previous microarray analyses of RNAs from 8-cell (8C) human embryos revealed a lack of cell cycle checkpoints and overexpression of core circadian oscillators and cell cycle drivers relative to pluripotent human stem cells [human embryonic stem cells/induced pluripotent stem (hES/iPS)] and fibroblasts, suggesting growth factor independence during early cleavage stages. To explore this possibility, we queried our combined microarray database for expression of 487 growth factors and receptors. Fifty-one gene elements were overdetected on the 8C arrays relative to hES/iPS cells, including 14 detected at least 80-fold higher, which annotated to multiple pathways: six cytokine family (CSF1R, IL2RG, IL3RA, IL4, IL17B, IL23R), four transforming growth factor beta (TGFB) family (BMP6, BMP15, GDF9, ENG), one fibroblast growth factor (FGF) family [FGF14(FH4)], one epidermal growth factor member (GAB1), plus CD36, and CLEC10A. 8C-specific gene elements were enriched (73%) for reported circadian-controlled genes in mouse tissues. High-level detection of CSF1R, ENG, IL23R, and IL3RA specifically on the 8C arrays suggests the embryo plays an active role in blocking immune rejection and is poised for trophectoderm development; robust detection of NRG1, GAB1, -2, GRB7, and FGF14(FHF4) indicates novel roles in early development in addition to their known roles in later development. Forty-four gene elements were underdetected on the 8C arrays, including 11 at least 80-fold under the pluripotent cells: two cytokines (IFITM1, TNFRSF8), five TGFBs (BMP7, LEFTY1, LEFTY2, TDGF1, TDGF3), two FGFs (FGF2, FGF receptor 1), plus ING5, and WNT6. The microarray detection patterns suggest that hES/iPS cells exhibit suppressed circadian competence, underexpression of early differentiation markers, and more robust expression of generic pluripotency genes, in keeping with an artificial state of continual uncommitted cell division. In contrast, gene expression patterns of the 8C embryo suggest that it is an independent circadian rhythm-competent equivalence group poised to signal its environment, defend against maternal immune rejection, and begin the rapid commitment events of early embryogenesis. PMID:26493868

  9. CisMiner: Genome-Wide In-Silico Cis-Regulatory Module Prediction by Fuzzy Itemset Mining

    PubMed Central

    Navarro, Carmen; Lopez, Francisco J.; Cano, Carlos; Garcia-Alcalde, Fernando; Blanco, Armando

    2014-01-01

    Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs). However, these tools present at least one of the following limitations: 1) scope limited to promoter or conserved regions of the genome; 2) do not allow to identify combinations involving more than two motifs; 3) require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding sites provided by the user. PMID:25268582

  10. Identification of single gene deletions at 15q13.3: further evidence that CHRNA7 causes the 15q13.3 microdeletion syndrome phenotype.

    PubMed

    Hoppman-Chaney, N; Wain, K; Seger, P R; Superneau, D W; Hodge, J C

    2013-04-01

    The 15q13.3 microdeletion syndrome (OMIM #612001) is characterized by a wide range of phenotypic features, including intellectual disability, seizures, autism, and psychiatric conditions. This deletion is inherited in approximately 75% of cases and has been found in mildly affected and normal parents, consistent with variable expressivity and incomplete penetrance. The common deletion is approximately 2 Mb and contains several genes; however, the gene(s) responsible for the resulting clinical features have not been clearly defined. Recently, four probands were reported with small deletions including only the CHRNA7 gene. These patients showed a wide range of phenotypic features similar to those associated with the larger 15q13.3 microdeletion. To further correlate genotype and phenotype, we queried our database of >15,000 patients tested in the Mayo Clinic Cytogenetics Laboratory from 2008 to 2011 and identified 19 individuals (10 probands and 9 family members) with isolated heterozygous CHRNA7 gene deletions. All but two infants displayed multiple features consistent with 15q13.3 microdeletion syndrome. We also identified the first de novo deletion confined to CHRNA7 as well as the second known case with homozygous deletion of CHRNA7 only. These results provide further evidence implicating CHRNA7 as the gene responsible for the clinical findings associated with 15q13.3 microdeletion. © 2012 John Wiley & Sons A/S. Published by Blackwell Publishing Ltd.

  11. A natural language query system for Hubble Space Telescope proposal selection

    NASA Technical Reports Server (NTRS)

    Hornick, Thomas; Cohen, William; Miller, Glenn

    1987-01-01

    The proposal selection process for the Hubble Space Telescope is assisted by a robust and easy to use query program (TACOS). The system parses an English subset language sentence regardless of the order of the keyword phases, allowing the user a greater flexibility than a standard command query language. Capabilities for macro and procedure definition are also integrated. The system was designed for flexibility in both use and maintenance. In addition, TACOS can be applied to any knowledge domain that can be expressed in terms of a single reaction. The system was implemented mostly in Common LISP. The TACOS design is described in detail, with particular attention given to the implementation methods of sentence processing.

  12. Exploring Genetic Attributions Underlying Radiotherapy-Induced Fatigue in Prostate Cancer Patients.

    PubMed

    Hashemi, Sepehr; Fernandez Martinez, Juan Luis; Saligan, Leorey; Sonis, Stephen

    2017-09-01

    Despite numerous proposed mechanisms, no definitive pathophysiology underlying radiotherapy-induced fatigue (RIF) has been established. However, the dysregulation of a set of 35 genes was recently validated to predict development of fatigue in prostate cancer patients receiving radiotherapy. To hypothesize novel pathways, and provide genetic targets for currently proposed pathways implicated in RIF development through analysis of the previously validated gene set. The gene set was analyzed for all phenotypic attributions implicated in the phenotype of fatigue. Initially, a "directed" approach was used by querying specific fatigue-related sub-phenotypes against all known phenotypic attributions of the gene set. Then, an "undirected" approach, reviewing the entirety of the literature referencing the 35 genes, was used to increase analysis sensitivity. The dysregulated genes attribute to neural, immunological, mitochondrial, muscular, and metabolic pathways. In addition, certain genes suggest phenotypes not previously emphasized in the context of RIF, such as ionizing radiation sensitivity, DNA damage, and altered DNA repair frequency. Several genes also associated with prostate cancer depression, possibly emphasizing variable radiosensitivity by RIF-prone patients, which may have palliative care implications. Despite the relevant findings, many of the 35 RIF-predictive genes are poorly characterized, warranting their investigation. The implications of herein presented RIF pathways are purely theoretical until specific end-point driven experiments are conducted in more congruent contexts. Nevertheless, the presented attributions are informative, directing future investigation to definitively elucidate RIF's pathoetiology. This study demonstrates an arguably comprehensive method of approaching known differential expression underlying a complex phenotype, to correlate feasible pathophysiology. Copyright © 2017 American Academy of Hospice and Palliative Medicine. All rights reserved.

  13. Gene signature associated with benign neurofibroma transformation to malignant peripheral nerve sheath tumors

    PubMed Central

    Sorzano, Carlos O. S.; Pascual-Montano, Alberto; Carazo, Jose M.

    2017-01-01

    Benign neurofibromas, the main phenotypic manifestations of the rare neurological disorder neurofibromatosis type 1, degenerate to malignant tumors associated to poor prognosis in about 10% of patients. Despite efforts in the field of (epi)genomics, the lack of prognostic biomarkers with which to predict disease evolution frustrates the adoption of appropriate early therapeutic measures. To identify potential biomarkers of malignant neurofibroma transformation, we integrated four human experimental studies and one for mouse, using a gene score-based meta-analysis method, from which we obtained a score-ranked signature of 579 genes. Genes with the highest absolute scores were classified as promising disease biomarkers. By grouping genes with similar neurofibromatosis-related profiles, we derived panels of potential biomarkers. The addition of promoter methylation data to gene profiles indicated a panel of genes probably silenced by hypermethylation. To identify possible therapeutic treatments, we used the gene signature to query drug expression databases. Trichostatin A and other histone deacetylase inhibitors, as well as cantharidin and tamoxifen, were retrieved as putative therapeutic means to reverse the aberrant regulation that drives to malignant cell proliferation and metastasis. This in silico prediction corroborated reported experimental results that suggested the inclusion of these compounds in clinical trials. This experimental validation supported the suitability of the meta-analysis method used to integrate several sources of public genomic information, and the reliability of the gene signature associated to the malignant evolution of neurofibromas to generate working hypotheses for prognostic and drug-responsive biomarkers or therapeutic measures, thus showing the potential of this in silico approach for biomarker discovery. PMID:28542306

  14. Bringing the fathead minnow (Pimephales promelas) into the ...

    EPA Pesticide Factsheets

    The fathead minnow (Pimephales promelas) is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. Throughout this time, a lot of knowledge has been gained about the fathead minnow’s biological responses to various xenobiotics. However, despite its importance as a model organism, the fathead minnow still has few publicly available gene sequences. Recently, Burns et al. (2015; Environ. Toxicol. Chem. 35:212) described the sequencing and de-novo assembly of the fathead minnow genome. Two draft genome assemblies are now publicly available on the GenBank database. However, on their own the draft assemblies remain of limited use to researchers who are primarily interested in the functional units of the genome, i.e. the genes. In the present study, an annotation pipeline, consisting of gene prediction, evidence alignment, and data synthesis, was applied to the fathead minnow SOAPdenovo assembly. Ab initio gene prediction was performed using AUGUSTUS, which provided a starting point of 43,345 gene predictions. Fathead minnow Expressed Sequence Tags (ESTs) and zebrafish protein-coding sequences (CDSs) were then aligned to the assembly using the corresponding spliced alignment methods of the program Exonerate. Of the over 240,000 EST alignments, 73% were successfully aligned with 90% or greater sequence identity and query coverage. Similarly, 39% of nearly 45,000 zebrafish co

  15. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  16. De novo assembly and characterization of fruit transcriptome in Litchi chinensis Sonn and analysis of differentially regulated genes in fruit in response to shading

    PubMed Central

    2013-01-01

    Background Litchi (Litchi chinensis Sonn.) is one of the most important fruit trees cultivated in tropical and subtropical areas. However, a lack of transcriptomic and genomic information hinders our understanding of the molecular mechanisms underlying fruit set and fruit development in litchi. Shading during early fruit development decreases fruit growth and induces fruit abscission. Here, high-throughput RNA sequencing (RNA-Seq) was employed for the de novo assembly and characterization of the fruit transcriptome in litchi, and differentially regulated genes, which are responsive to shading, were also investigated using digital transcript abundance(DTA)profiling. Results More than 53 million paired-end reads were generated and assembled into 57,050 unigenes with an average length of 601 bp. These unigenes were annotated by querying against various public databases, with 34,029 unigenes found to be homologous to genes in the NCBI GenBank database and 22,945 unigenes annotated based on known proteins in the Swiss-Prot database. In further orthologous analyses, 5,885 unigenes were assigned with one or more Gene Ontology terms, 10,234 hits were aligned to the 24 Clusters of Orthologous Groups classifications and 15,330 unigenes were classified into 266 Kyoto Encyclopedia of Genes and Genomes pathways. Based on the newly assembled transcriptome, the DTA profiling approach was applied to investigate the differentially expressed genes related to shading stress. A total of 3.6 million and 3.5 million high-quality tags were generated from shaded and non-shaded libraries, respectively. As many as 1,039 unigenes were shown to be significantly differentially regulated. Eleven of the 14 differentially regulated unigenes, which were randomly selected for more detailed expression comparison during the course of shading treatment, were identified as being likely to be involved in the process of fruitlet abscission in litchi. Conclusions The assembled transcriptome of litchi fruit provides a global description of expressed genes in litchi fruit development, and could serve as an ideal repository for future functional characterization of specific genes. The DTA analysis revealed that more than 1000 differentially regulated unigenes respond to the shading signal, some of which might be involved in the fruitlet abscission process in litchi, shedding new light on the molecular mechanisms underlying organ abscission. PMID:23941440

  17. A novel combination of ω-3 fatty acids and nano-curcumin modulates interleukin-6 gene expression and high sensitivity C-reactive protein serum levels in patients with migraine: a randomized clinical trial study.

    PubMed

    Abdolahi, Mina; Sarraf, Payam; Javanbakht, Mohammad Hassan; Honarvar, Niyaz Mohammadzadeh; Hatami, Mahsa; Soveyd, Neda; Tafakhori, Abbas; Sedighiyan, Mohsen; Djalali, Mona; Jafarieh, Arash; Masoudian, Yousef; Djalali, Mahmoud

    2018-06-24

    Migraine is a disabling neuroinflammatory condition characterized by increasing the levels of interleukin (IL)-6, a proinflammatory cytokine and C-reactive protein (CRP) which considered as a vascular inflammatory mediator, disrupting the integrity of blood-brain barrier and contributing to neurogenic inflammation, and disease progression. Curcumin and ω-3 fatty acids can exert neuroprotective effects through modulation of IL-6 gene expression and CRP levels. The aim of present study is the evaluation of combined effects of ω-3 fatty acids and nano-curcumin supplementation on IL-6 gene expression and serum level and hs-CRP levels in migraine patients. Eighty episodic migraine patients enrolled in the trial and were divided into four groups as 1) combination of ω-3 fatty acids (2500 mg) plus nano-curcumin (80 mg), 2) ω-3 (2500 mg), 3) nano-curcumin (80 mg), and 4) the control (ω-3 and nano-cucumin placebo included oral paraffin oil) over a two-month period. At the beginning and the end of the study, the expression of IL-6 from peripheral blood mononuclear cells and IL-6 and hs-CRP serum levels were measured, using a real-time PCR and ELISA methods, respectively. The results showed that both of ω-3 and nano-curcumin down-regulated IL-6 mRAN and significantly decreased the serum concentration. hs-CRP serum levels significantly decrease in combination and nano-curcumin within groups (P<0.05). An additive greater reduction of IL-6 and hs-CRP was observed in the combination group suggested a possible synergetic relation. It seems that, ω-3 fatty acids and curcumin supplementation can be considered a new promising target in migraine prevention. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Computational annotation of genes differentially expressed along olive fruit development

    PubMed Central

    Galla, Giulio; Barcaccia, Gianni; Ramina, Angelo; Collani, Silvio; Alagna, Fiammetta; Baldoni, Luciana; Cultrera, Nicolò GM; Martinelli, Federico; Sebastiani, Luca; Tonutti, Pietro

    2009-01-01

    Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1), completed pit hardening (stage 2) and veraison (stage 3)] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B), and 2 and 3 (libraries C and D). All sequenced clones (1,132 in total) were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO): cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening. PMID:19852839

  19. Cross-species multiple environmental stress responses: An integrated approach to identify candidate genes for multiple stress tolerance in sorghum (Sorghum bicolor (L.) Moench) and related model species

    PubMed Central

    Modise, David M.; Gemeildien, Junaid; Ndimba, Bongani K.; Christoffels, Alan

    2018-01-01

    Background Crop response to the changing climate and unpredictable effects of global warming with adverse conditions such as drought stress has brought concerns about food security to the fore; crop yield loss is a major cause of concern in this regard. Identification of genes with multiple responses across environmental stresses is the genetic foundation that leads to crop adaptation to environmental perturbations. Methods In this paper, we introduce an integrated approach to assess candidate genes for multiple stress responses across-species. The approach combines ontology based semantic data integration with expression profiling, comparative genomics, phylogenomics, functional gene enrichment and gene enrichment network analysis to identify genes associated with plant stress phenotypes. Five different ontologies, viz., Gene Ontology (GO), Trait Ontology (TO), Plant Ontology (PO), Growth Ontology (GRO) and Environment Ontology (EO) were used to semantically integrate drought related information. Results Target genes linked to Quantitative Trait Loci (QTLs) controlling yield and stress tolerance in sorghum (Sorghum bicolor (L.) Moench) and closely related species were identified. Based on the enriched GO terms of the biological processes, 1116 sorghum genes with potential responses to 5 different stresses, such as drought (18%), salt (32%), cold (20%), heat (8%) and oxidative stress (25%) were identified to be over-expressed. Out of 169 sorghum drought responsive QTLs associated genes that were identified based on expression datasets, 56% were shown to have multiple stress responses. On the other hand, out of 168 additional genes that have been evaluated for orthologous pairs, 90% were conserved across species for drought tolerance. Over 50% of identified maize and rice genes were responsive to drought and salt stresses and were co-located within multifunctional QTLs. Among the total identified multi-stress responsive genes, 272 targets were shown to be co-localized within QTLs associated with different traits that are responsive to multiple stresses. Ontology mapping was used to validate the identified genes, while reconstruction of the phylogenetic tree was instrumental to infer the evolutionary relationship of the sorghum orthologs. The results also show specific genes responsible for various interrelated components of drought response mechanism such as drought tolerance, drought avoidance and drought escape. Conclusions We submit that this approach is novel and to our knowledge, has not been used previously in any other research; it enables us to perform cross-species queries for genes that are likely to be associated with multiple stress tolerance, as a means to identify novel targets for engineering stress resistance in sorghum and possibly, in other crop species. PMID:29590108

  20. ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.

    PubMed

    Krishnakumar, Vivek; Contrino, Sergio; Cheng, Chia-Yi; Belyaeva, Irina; Ferlanti, Erik S; Miller, Jason R; Vaughn, Matthew W; Micklem, Gos; Town, Christopher D; Chan, Agnes P

    2017-01-01

    ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  1. Gene and process level modulation to overcome the bottlenecks of recombinant proteins expression in Pichia pastoris.

    PubMed

    Prabhu, Ashish A; Boro, Bibari; Bharali, Biju; Chakraborty, Shuchishloka; Dasu, Veeranki V

    2017-01-01

    Process development involving system metabolic engineering and bioprocess engineering has become one of the major thrust for the development of therapeutic proteins or enzymes. Pichia pastoris has emerged as a prominent host for the production of therapeutic protein or enzymes. Regardless of producing high protein titers, various cellular and process level bottlenecks restrict the expression of recombinant proteins in P. pastoris. In the present review, we have summarized the recent developments in the expression of foreign proteins in P. pastoris. Further, we have discussed various cellular engineering strategies which include codon optimization, pathway engineering, signal peptide processing, development of protease deficient strain and glyco-engineered strains for the high yield protein secretion of recombinant protein. Bioprocess development of recombinant proteins in large-scale bioreactor including medium optimization, optimum feeding strategy and co-substrate feeding in fed-batch as well as continuous cultivation have been described. The recent advances in system and synthetic biology studies including metabolic flux analysis in understanding the phenotypic characteristics of recombinant Pichia and genome editing with CRISPR-CAS system have also been summarized. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Insights into significant pathways and gene interaction networks underlying breast cancer cell line MCF-7 treated with 17β-estradiol (E2).

    PubMed

    Huan, Jinliang; Wang, Lishan; Xing, Li; Qin, Xianju; Feng, Lingbin; Pan, Xiaofeng; Zhu, Ling

    2014-01-01

    Estrogens are known to regulate the proliferation of breast cancer cells and to alter their cytoarchitectural and phenotypic properties, but the gene networks and pathways by which estrogenic hormones regulate these events are only partially understood. We used global gene expression profiling by Affymetrix GeneChip microarray analysis, with KEGG pathway enrichment, PPI network construction, module analysis and text mining methods to identify patterns and time courses of genes that are either stimulated or inhibited by estradiol (E2) in estrogen receptor (ER)-positive MCF-7 human breast cancer cells. Of the genes queried on the Affymetrix Human Genome U133 plus 2.0 microarray, we identified 628 (12h), 852 (24h) and 880 (48 h) differentially expressed genes (DEGs) that showed a robust pattern of regulation by E2. From pathway enrichment analysis, we found out the changes of metabolic pathways of E2 treated samples at each time point. At 12h time point, the changes of metabolic pathways were mainly focused on pathways in cancer, focal adhesion, and chemokine signaling pathway. At 24h time point, the changes were mainly enriched in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction and calcium signaling pathway. At 48 h time point, the significant pathways were pathways in cancer, regulation of actin cytoskeleton, cell adhesion molecules (CAMs), axon guidance and ErbB signaling pathway. Of interest, our PPI network analysis and module analysis found that E2 treatment induced enhancement of PRSS23 at the three time points and PRSS23 was in the central position of each module. Text mining results showed that the important genes of DEGs have relationship with signal pathways, such as ERbB pathway (AREG), Wnt pathway (NDP), MAPK pathway (NTRK3, TH), IP3 pathway (TRA@) and some transcript factors (TCF4, MAF). Our studies highlight the diverse gene networks and metabolic and cell regulatory pathways through which E2 operates to achieve its widespread effects on breast cancer cells. © 2013 Elsevier B.V. All rights reserved.

  3. SpliceCenter: A suite of web-based bioinformatic applications for evaluating the impact of alternative splicing on RT-PCR, RNAi, microarray, and peptide-based studies

    PubMed Central

    Ryan, Michael C; Zeeberg, Barry R; Caplen, Natasha J; Cleland, James A; Kahn, Ari B; Liu, Hongfang; Weinstein, John N

    2008-01-01

    Background Over 60% of protein-coding genes in vertebrates express mRNAs that undergo alternative splicing. The resulting collection of transcript isoforms poses significant challenges for contemporary biological assays. For example, RT-PCR validation of gene expression microarray results may be unsuccessful if the two technologies target different splice variants. Effective use of sequence-based technologies requires knowledge of the specific splice variant(s) that are targeted. In addition, the critical roles of alternative splice forms in biological function and in disease suggest that assay results may be more informative if analyzed in the context of the targeted splice variant. Results A number of contemporary technologies are used for analyzing transcripts or proteins. To enable investigation of the impact of splice variation on the interpretation of data derived from those technologies, we have developed SpliceCenter. SpliceCenter is a suite of user-friendly, web-based applications that includes programs for analysis of RT-PCR primer/probe sets, effectors of RNAi, microarrays, and protein-targeting technologies. Both interactive and high-throughput implementations of the tools are provided. The interactive versions of SpliceCenter tools provide visualizations of a gene's alternative transcripts and probe target positions, enabling the user to identify which splice variants are or are not targeted. The high-throughput batch versions accept user query files and provide results in tabular form. When, for example, we used SpliceCenter's batch siRNA-Check to process the Cancer Genome Anatomy Project's large-scale shRNA library, we found that only 59% of the 50,766 shRNAs in the library target all known splice variants of the target gene, 32% target some but not all, and 9% do not target any currently annotated transcript. Conclusion SpliceCenter provides unique, user-friendly applications for assessing the impact of transcript variation on the design and interpretation of RT-PCR, RNAi, gene expression microarrays, antibody-based detection, and mass spectrometry proteomics. The tools are intended for use by bench biologists as well as bioinformaticists. PMID:18638396

  4. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  5. An intuitive graphical webserver for multiple-choice protein sequence search.

    PubMed

    Banky, Daniel; Szalkai, Balazs; Grolmusz, Vince

    2014-04-10

    Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Wide-Open: Accelerating public data release by automating detection of overdue datasets

    PubMed Central

    Poon, Hoifung; Howe, Bill

    2017-01-01

    Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819

  7. Wide-Open: Accelerating public data release by automating detection of overdue datasets.

    PubMed

    Grechkin, Maxim; Poon, Hoifung; Howe, Bill

    2017-06-01

    Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

  8. Self-focusing therapeutic gene delivery with intelligent gene vector swarms: intra-swarm signalling through receptor transgene expression in targeted cells.

    PubMed

    Tolmachov, Oleg E

    2015-01-01

    Gene delivery in vivo that is tightly focused on the intended target cells is essential to maximize the benefits of gene therapy and to reduce unwanted side-effects. Cell surface markers are immediately available for probing by therapeutic gene vectors and are often used to direct gene transfer with these vectors to specific target cell populations. However, it is not unusual for the choice of available extra-cellular markers to be too scarce to provide a reliable definition of the desired therapeutically relevant set of target cells. Therefore, interrogation of intra-cellular determinants of cell-specificity, such as tissue-specific transcription factors, can be vital in order to provide detailed cell-guiding information to gene vector particles. An important improvement in cell-specific gene delivery can be achieved through auto-buildup in vector homing efficiency using intelligent 'self-focusing' of swarms of vector particles on target cells. Vector self-focusing was previously suggested to rely on the release of diffusible chemo-attractants after a successful target-specific hit by 'scout' vector particles. I hypothesize that intelligent self-focusing behaviour of swarms of cell-targeted therapeutic gene vectors can be accomplished without the employment of difficult-to-use diffusible chemo-attractants, instead relying on the intra-swarm signalling through cells expressing a non-diffusible extra-cellular receptor for the gene vectors. In the proposed model, cell-guiding information is gathered by the 'scout' gene vector particles, which: (1) attach to a variety of cells via a weakly binding (low affinity) receptor; (2) successfully facilitate gene transfer into these cells; (3) query intra-cellular determinants of cell-specificity with their transgene expression control elements and (4) direct the cell-specific biosynthesis of a vector-encoded strongly binding (high affinity) cell-surface receptor. Free members of the vector swarm loaded with therapeutic cargo are then attracted to and internalized into the intended target cells via the expressed cognate strongly binding extra-cellular receptor, causing escalation of gene transfer into these cells and increasing the copy number of the therapeutic gene expression modules. Such self-focusing swarms of gene vectors can be either homogeneous, with 'scout' and 'therapeutic' members of the swarm being structurally identical, or, alternatively, heterogeneous (split), with 'scout' and 'therapeutic' members of the swarm being structurally specialized. It is hoped that the proposed self-focusing cell-targeted gene vector swarms with receptor-mediated intra-swarm signalling could be particularly effective in 'top-up' gene delivery scenarios, achieving high-level and sustained expression of therapeutic transgenes that are prone to shut-down through degradation and silencing. Crucially, in contrast to low-precision 'general location' vector guidance by diffusible chemo-attractants, ear-marking non-diffusible receptors can provide high-accuracy targeting of therapeutic vector particles to the specific cell, which has undergone a 'successful cell-specific hit' by a 'scout' vector particle. Opportunities for cell targeting could be expanded, since in the proposed model of self-focusing it could be possible to probe a broad selection of intra-cellular determinants of cell-specificity and not just to rely exclusively on extra-cellular markers of cell-specificity. By employing such self-focusing gene vectors for the improvement of cell-targeted delivery of therapeutic genes, e.g., in cancer therapy or gene addition therapy of recessive genetic diseases, it could be possible to broaden a leeway for the reduction of the vector load and, consequently, to minimize undesired vector cytotoxicity, immune reactions, and the risk of inadvertent genetic modification of germline cells in genetic treatment in vivo. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Alpha-synuclein mRNA isoform formation and translation affected by polymorphism in the human SNCA 3'UTR.

    PubMed

    Barrie, Elizabeth S; Lee, Sung-Ha; Frater, John T; Kataki, Maria; Scharre, Douglas W; Sadee, Wolfgang

    2018-05-06

    Multiple variants in SNCA, encoding alpha-synuclein, a main component of Lewy bodies, are implicated in Parkinson's disease. We searched for cis-acting SNCA variants using allelic mRNA ratios in human brain tissues. In a SNCA 3'UTR (2,520 bp) luciferase reporter gene assay, translation in SH-SY5Y cells in the presence of the rs17016074 G/A alleles was measured. To assess clinical impact, we queried neurocognitive genome-wide association studies. Allelic ratios deviated up to twofold, measured at a marker SNP in the middle of a long 3' untranslated region (3'UTR), but not at a marker at its start, suggesting regulation of 3'UTR processing. 3'UTR SNP rs17016074 G/A, minor allele frequency (MAF) <1% in Caucasians, 13% in Africans, strongly associates with large allelic mRNA expression imbalance (AEI), resulting in reduced expression of long 3'UTR isoforms. A second 3'UTR SNP (rs356165) associates with moderate AEI and enhances SNCA mRNA expression. The rs17016074 A allele reduces overall 3'UTR expression in luciferase reporter gene assays but supports more efficient translation, resolving previous contradictory results. We failed to detect significant genome-wide associations for rs17016074, possibly a result of low MAF in Caucasians or its absence from most genotyping panels. In the "Genome Wide Association Study of Yoruba in Nigeria," rs356165 was associated with reduced memory performance. Here, we identify two cis-acting regulatory variants affecting SNCA mRNA expression, measured by allelic ratios in the 3'UTR. The rs17016074 minor A allele is associated with higher expression of luciferase protein activity. Resolving the genetic influence of SNCA polymorphisms requires study of the interactions between multiple regulatory variants with distinct frequencies among populations. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.

  10. Mining of the Uncharacterized Cytochrome P450 Genes Involved in Alkaloid Biosynthesis in California Poppy Using a Draft Genome Sequence

    PubMed Central

    Hori, Kentaro; Yamada, Yasuyuki; Purwanto, Ratmoyo; Minakuchi, Yohei; Toyoda, Atsushi; Hirakawa, Hideki

    2018-01-01

    Abstract Land plants produce specialized low molecular weight metabolites to adapt to various environmental stressors, such as UV radiation, pathogen infection, wounding and animal feeding damage. Due to the large variety of stresses, plants produce various chemicals, particularly plant species-specific alkaloids, through specialized biosynthetic pathways. In this study, using a draft genome sequence and querying known biosynthetic cytochrome P450 (P450) enzyme-encoding genes, we characterized the P450 genes involved in benzylisoquinoline alkaloid (BIA) biosynthesis in California poppy (Eschscholzia californica), as P450s are key enzymes involved in the diversification of specialized metabolism. Our in silico studies showed that all identified enzyme-encoding genes involved in BIA biosynthesis were found in the draft genome sequence of approximately 489 Mb, which covered approximately 97% of the whole genome (502 Mb). Further analyses showed that some P450 families involved in BIA biosynthesis, i.e. the CYP80, CYP82 and CYP719 families, were more enriched in the genome of E. californica than in the genome of Arabidopsis thaliana, a plant that does not produce BIAs. CYP82 family genes were highly abundant, so we measured the expression of CYP82 genes with respect to alkaloid accumulation in different plant tissues and two cell lines whose BIA production differs to estimate the functions of the genes. Further characterization revealed two highly homologous P450s (CYP82P2 and CYP82P3) that exhibited 10-hydroxylase activities with different substrate specificities. Here, we discuss the evolution of the P450 genes and the potential for further genome mining of the genes encoding the enzymes involved in BIA biosynthesis. PMID:29301019

  11. Normalization of TAM post-receptor signaling reveals a cell invasive signature for Axl tyrosine kinase.

    PubMed

    Kimani, Stanley G; Kumar, Sushil; Davra, Viralkumar; Chang, Yun-Juan; Kasikara, Canan; Geng, Ke; Tsou, Wen-I; Wang, Shenyan; Hoque, Mainul; Boháč, Andrej; Lewis-Antes, Anita; De Lorenzo, Mariana S; Kotenko, Sergei V; Birge, Raymond B

    2016-09-06

    Tyro3, Axl, and Mertk (TAMs) are a family of three conserved receptor tyrosine kinases that have pleiotropic roles in innate immunity and homeostasis and when overexpressed in cancer cells can drive tumorigenesis. In the present study, we engineered EGFR/TAM chimeric receptors (EGFR/Tyro3, EGFR/Axl, and EGF/Mertk) with the goals to interrogate post-receptor functions of TAMs, and query whether TAMs have unique or overlapping post-receptor activation profiles. Stable expression of EGFR/TAMs in EGFR-deficient CHO cells afforded robust EGF inducible TAM receptor phosphorylation and activation of downstream signaling. Using a series of unbiased screening approaches, that include kinome-view analysis, phosphor-arrays, RNAseq/GSEA analysis, as well as cell biological and in vivo readouts, we provide evidence that each TAM has unique post-receptor signaling platforms and identify an intrinsic role for Axl that impinges on cell motility and invasion compared to Tyro3 and Mertk. These studies demonstrate that TAM show unique post-receptor signatures that impinge on distinct gene expression profiles and tumorigenic outcomes.

  12. Plant Genome Resources at the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Smith-White, Brian; Chetvernin, Vyacheslav; Resenchuk, Sergei; Dombrowski, Susan M.; Pechous, Steven W.; Tatusova, Tatiana; Ostell, James

    2005-01-01

    The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI. PMID:16010002

  13. Genetic Architecture and Molecular Networks Underlying Leaf Thickness in Desert-Adapted Tomato Solanum pennellii1[OPEN

    PubMed Central

    Frank, Margaret H.; Balaguer, Maria A. de Luis; Li, Mao

    2017-01-01

    Thicker leaves allow plants to grow in water-limited conditions. However, our understanding of the genetic underpinnings of this highly functional leaf shape trait is poor. We used a custom-built confocal profilometer to directly measure leaf thickness in a set of introgression lines (ILs) derived from the desert tomato Solanum pennellii and identified quantitative trait loci. We report evidence of a complex genetic architecture of this trait and roles for both genetic and environmental factors. Several ILs with thick leaves have dramatically elongated palisade mesophyll cells and, in some cases, increased leaf ploidy. We characterized the thick IL2-5 and IL4-3 in detail and found increased mesophyll cell size and leaf ploidy levels, suggesting that endoreduplication underpins leaf thickness in tomato. Next, we queried the transcriptomes and inferred dynamic Bayesian networks of gene expression across early leaf ontogeny in these lines to compare the molecular networks that pattern leaf thickness. We show that thick ILs share S. pennellii-like expression profiles for putative regulators of cell shape and meristem determinacy as well as a general signature of cell cycle-related gene expression. However, our network data suggest that leaf thickness in these two lines is patterned at least partially by distinct mechanisms. Consistent with this hypothesis, double homozygote lines combining introgression segments from these two ILs show additive phenotypes, including thick leaves, higher ploidy levels, and larger palisade mesophyll cells. Collectively, these data establish a framework of genetic, anatomical, and molecular mechanisms that pattern leaf thickness in desert-adapted tomato. PMID:28794258

  14. Using the TIGR gene index databases for biological discovery.

    PubMed

    Lee, Yuandan; Quackenbush, John

    2003-11-01

    The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

  15. Transcriptome-wide identification of Rauvolfia serpentina microRNAs and prediction of their potential targets.

    PubMed

    Prakash, Pravin; Rajakani, Raja; Gupta, Vikrant

    2016-04-01

    MicroRNAs (miRNAs) are small non-coding RNAs of ∼ 19-24 nucleotides (nt) in length and considered as potent regulators of gene expression at transcriptional and post-transcriptional levels. Here we report the identification and characterization of 15 conserved miRNAs belonging to 13 families from Rauvolfia serpentina through in silico analysis of available nucleotide dataset. The identified mature R. serpentina miRNAs (rse-miRNAs) ranged between 20 and 22nt in length, and the average minimal folding free energy index (MFEI) value of rse-miRNA precursor sequences was found to be -0.815 kcal/mol. Using the identified rse-miRNAs as query, their potential targets were predicted in R. serpentina and other plant species. Gene Ontology (GO) annotation showed that predicted targets of rse-miRNAs include transcription factors as well as genes involved in diverse biological processes such as primary and secondary metabolism, stress response, disease resistance, growth, and development. Few rse-miRNAs were predicted to target genes of pharmaceutically important secondary metabolic pathways such as alkaloids and anthocyanin biosynthesis. Phylogenetic analysis showed the evolutionary relationship of rse-miRNAs and their precursor sequences to homologous pre-miRNA sequences from other plant species. The findings under present study besides giving first hand information about R. serpentina miRNAs and their targets, also contributes towards the better understanding of miRNA-mediated gene regulatory processes in plants. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

    PubMed

    Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel

    2016-03-01

    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  17. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.

  18. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659

  19. Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.

    PubMed

    Santamaria, Monica; Vicario, Saverio; Pappadà, Graziano; Scioscia, Gaetano; Scazzocchio, Claudio; Saccone, Cecilia

    2009-06-16

    A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinformatic surveys is due to the presence of mobile introns in almost all the fungal mitochondrial genes. The aim of this work is to verify the incidence of this phenomenon in Ascomycota, testing, at the same time, a new bioinformatic tool for extracting and managing sequence databases annotations, in order to identify the mitochondrial gene regions where introns are missing so as to propose them as species markers. The general trend towards a large occurrence of introns in the mitochondrial genome of Fungi has been confirmed in Ascomycota by an extensive bioinformatic analysis, performed on all the entries concerning 11 mitochondrial protein coding genes and 2 mitochondrial rRNA (ribosomal RNA) specifying genes, belonging to this phylum, available in public nucleotide sequence databases. A new query approach has been developed to retrieve effectively introns information included in these entries. After comparing the new query-based approach with a blast-based procedure, with the aim of designing a faithful Ascomycota mitochondrial intron map, the first method appeared clearly the most accurate. Within this map, despite the large pervasiveness of introns, it is possible to distinguish specific regions comprised in several genes, including the full NADH dehydrogenase subunit 6 (ND6) gene, which could be considered as barcode candidates for Ascomycota due to their paucity of introns and to their length, above 400 bp, comparable to the lower end size of the length range of barcodes successfully used in animals. The development of the new query system described here would answer the pressing requirement to improve drastically the bioinformatics support to the DNA Barcode Initiative. The large scale investigation of Ascomycota mitochondrial introns performed through this tool, allowing to exclude the introns-rich sequences from the barcode candidates exploration, could be the first step towards a mitochondrial barcoding strategy for these organisms, similar to the standard approach employed in metazoans.

  20. Specialized microbial databases for inductive exploration of microbial genome sequences

    PubMed Central

    Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

    2005-01-01

    Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474

  1. WormBase ParaSite - a comprehensive resource for helminth genomics.

    PubMed

    Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew

    2017-07-01

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Characterization and chromosomal mapping of the human TFG gene involved in thyroid carcinoma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mencinger, M.; Panagopoulos, I.; Andreasson, P.

    1997-05-01

    Homology searches in the Expressed Sequence Tag Database were performed using SPYGQ-rich regions as query sequences to find genes encoding protein regions similar to the N-terminal parts of the sarcoma-associated EWS and FUS proteins. Clone 22911 (T74973), encoding a SPYGQ-rich region in its 5{prime} end, and several other clones that overlapped 22911 were selected. The combined data made it possible to assemble a full-length cDNA sequence. This cDNA sequence is 1677 bp, containing an initiation codon ATG, an open reading frame of 400 amino acids, a poly(A) signal, and a poly(A) tail. We found 100% identity between the 5{prime} partmore » of the consensus sequence and the 598-bp-long sequence named TFG. The TFG sequence is fused to the 3{prime} end of NTRK1, generating the TRK-T3 fusion transcript found in papillary thyroid carcinoma. The cDNA therefore represents the full-length transcript of the TFG gene. TFG was localized to 3q11-q12 by fluorescence in situ hybridization. The 3{prime} and the 5{prime} ends of the TFG cDNA probe hybridized to a 2.2-kb band on Northern blot filters in all tissues examined. 28 refs., 5 figs., 1 tab.« less

  3. GeneMachine: gene prediction and sequence annotation.

    PubMed

    Makalowska, I; Ryan, J F; Baxevanis, A D

    2001-09-01

    A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.

  4. Federated ontology-based queries over cancer data

    PubMed Central

    2012-01-01

    Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043

  5. Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data

    PubMed Central

    Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T.; Girard, F. Pascal; Stephens, Robert M.

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework. PMID:24312478

  6. Knowledge and theme discovery across very large biological data sets using distributed queries: a prototype combining unstructured and structured data.

    PubMed

    Mudunuri, Uma S; Khouja, Mohamad; Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T; Girard, F Pascal; Stephens, Robert M

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.

  7. Inductive matrix completion for predicting gene-disease associations.

    PubMed

    Natarajan, Nagarajan; Dhillon, Inderjit S

    2014-06-15

    Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.

  8. The Therapeutic Role of Xenobiotic Nuclear Receptors against Metabolic Syndrome.

    PubMed

    Pu, Shuqi; Wu, Xiaojie; Yang, Xiaoying; Zhang, Yunzhan; Dai, Yunkai; Zhang, Yueling; Wu, Xiaoting; Liu, Yan; Cui, Xiaona; Jin, Haiyong; Cao, Jianhong; Li, Ruliu; Cai, Jiazhong; Cao, Qizhi; Hu, Ling; Gao, Yong

    2018-06-10

    Xenobiotic nuclear receptors (XNRs) are nuclear receptors that characterized by coordinately regulating the expression of genes encoding drug-metabolizing enzymes and transporters to essentially eliminate and detoxify xenobiotics and endobiotics from the body, including the peroxisome proliferator-activated receptor (PPAR), the farnesoid X receptor (FXR), the liver X receptor (LXR), the pregnane X receptor (PXR) and the constitutive androstane receptor (CAR). Heretofore, increasing evidences have suggested that these five XNRs are not only involved in the regulation of xeno-/endo-biotics detoxication but also the development of human diseases, such as cancer, obesity and diabetes. PPAR, FXR, LXR, PXR and CAR, as the receptors for numerous natural or synthetic compounds may be the most effective therapeutic targets in the treatment of metabolic diseases. In this review, we will focus on these five XNRs and their recently discovered functions in diabetes and its complications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. xQTL workbench: a scalable web environment for multi-level QTL analysis.

    PubMed

    Arends, Danny; van der Velde, K Joeri; Prins, Pjotr; Broman, Karl W; Möller, Steffen; Jansen, Ritsert C; Swertz, Morris A

    2012-04-01

    xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. m.a.swertz@rug.nl.

  10. xQTL workbench: a scalable web environment for multi-level QTL analysis

    PubMed Central

    Arends, Danny; van der Velde, K. Joeri; Prins, Pjotr; Broman, Karl W.; Möller, Steffen; Jansen, Ritsert C.; Swertz, Morris A.

    2012-01-01

    Summary: xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. Availability: xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. Contact: m.a.swertz@rug.nl PMID:22308096

  11. Global Mapping of the Yeast Genetic Interaction Network

    NASA Astrophysics Data System (ADS)

    Tong, Amy Hin Yan; Lesage, Guillaume; Bader, Gary D.; Ding, Huiming; Xu, Hong; Xin, Xiaofeng; Young, James; Berriz, Gabriel F.; Brost, Renee L.; Chang, Michael; Chen, YiQun; Cheng, Xin; Chua, Gordon; Friesen, Helena; Goldberg, Debra S.; Haynes, Jennifer; Humphries, Christine; He, Grace; Hussein, Shamiza; Ke, Lizhu; Krogan, Nevan; Li, Zhijian; Levinson, Joshua N.; Lu, Hong; Ménard, Patrice; Munyana, Christella; Parsons, Ainslie B.; Ryan, Owen; Tonikian, Raffi; Roberts, Tania; Sdicu, Anne-Marie; Shapiro, Jesse; Sheikh, Bilal; Suter, Bernhard; Wong, Sharyl L.; Zhang, Lan V.; Zhu, Hongwei; Burd, Christopher G.; Munro, Sean; Sander, Chris; Rine, Jasper; Greenblatt, Jack; Peter, Matthias; Bretscher, Anthony; Bell, Graham; Roth, Frederick P.; Brown, Grant W.; Andrews, Brenda; Bussey, Howard; Boone, Charles

    2004-02-01

    A genetic interaction network containing ~1000 genes and ~4000 interactions was mapped by crossing mutations in 132 different query genes into a set of ~4700 viable gene yeast deletion mutants and scoring the double mutant progeny for fitness defects. Network connectivity was predictive of function because interactions often occurred among functionally related genes, and similar patterns of interactions tended to identify components of the same pathway. The genetic network exhibited dense local neighborhoods; therefore, the position of a gene on a partially mapped network is predictive of other genetic interactions. Because digenic interactions are common in yeast, similar networks may underlie the complex genetics associated with inherited phenotypes in other organisms.

  12. Designing a Syntax-Based Retrieval System for Supporting Language Learning

    ERIC Educational Resources Information Center

    Tsao, Nai-Lung; Kuo, Chin-Hwa; Wible, David; Hung, Tsung-Fu

    2009-01-01

    In this paper, we propose a syntax-based text retrieval system for on-line language learning and use a fast regular expression search engine as its main component. Regular expression searches provide more scalable querying and search results than keyword-based searches. However, without a well-designed index scheme, the execution time of regular…

  13. RDFBuilder: a tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources.

    PubMed

    Anguita, Alberto; Martin, Luis; Garcia-Remesal, Miguel; Maojo, Victor

    2013-07-01

    This paper presents RDFBuilder, a tool that enables RDF-based access to MAGE-ML-compliant microarray databases. We have developed a system that automatically transforms the MAGE-OM model and microarray data stored in the ArrayExpress database into RDF format. Additionally, the system automatically enables a SPARQL endpoint. This allows users to execute SPARQL queries for retrieving microarray data, either from specific experiments or from more than one experiment at a time. Our system optimizes response times by caching and reusing information from previous queries. In this paper, we describe our methods for achieving this transformation. We show that our approach is complementary to other existing initiatives, such as Bio2RDF, for accessing and retrieving data from the ArrayExpress database. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  14. Integrated databanks access and sequence/structure analysis services at the PBIL.

    PubMed

    Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

    2003-07-01

    The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.

  15. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data

    PubMed Central

    Kaas, Quentin; Ruiz, Manuel; Lefranc, Marie-Paule

    2004-01-01

    IMGT/3Dstructure-DB and IMGT/Structural-Query are a novel 3D structure database and a new tool for immunological proteins. They are part of IMGT, the international ImMunoGenetics information system®, a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other vertebrate species, which consists of databases, Web resources and interactive on-line tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC proteins with known 3D structures, domain delimitations, amino acid positions according to the IMGT unique numbering and renumbered coordinate flat files. Moreover IMGT/3Dstructure-DB provides 2D graphical representations (or Collier de Perles) and results of contact analysis. The IMGT/StructuralQuery tool allows search of this database based on specific structural characteristics. IMGT/3Dstructure-DB and IMGT/StructuralQuery are freely available at http://imgt.cines.fr. PMID:14681396

  16. The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations.

    PubMed

    Lee, HoJoon; Palm, Jennifer; Grimes, Susan M; Ji, Hanlee P

    2015-10-27

    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer. The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history. The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

  17. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data.

    PubMed

    Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G; Khanna, Sanjeev

    2017-06-01

    Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.

  18. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data*

    PubMed Central

    Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G.; Khanna, Sanjeev

    2017-01-01

    Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings. PMID:29151821

  19. Network-Based Identification and Prioritization of Key Regulators of Coronary Artery Disease Loci

    PubMed Central

    Zhao, Yuqi; Chen, Jing; Freudenberg, Johannes M.; Meng, Qingying; Rajpal, Deepak K.; Yang, Xia

    2017-01-01

    Objective Recent genome-wide association studies of coronary artery disease (CAD) have revealed 58 genome-wide significant and 148 suggestive genetic loci. However, the molecular mechanisms through which they contribute to CAD and the clinical implications of these findings remain largely unknown. We aim to retrieve gene subnetworks of the 206 CAD loci and identify and prioritize candidate regulators to better understand the biological mechanisms underlying the genetic associations. Approach and Results We devised a new integrative genomics approach that incorporated (1) candidate genes from the top CAD loci, (2) the complete genetic association results from the 1000 genomes-based CAD genome-wide association studies from the Coronary Artery Disease Genome Wide Replication and Meta-Analysis Plus the Coronary Artery Disease consortium, (3) tissue-specific gene regulatory networks that depict the potential relationship and interactions between genes, and (4) tissue-specific gene expression patterns between CAD patients and controls. The networks and top-ranked regulators according to these data-driven criteria were further queried against literature, experimental evidence, and drug information to evaluate their disease relevance and potential as drug targets. Our analysis uncovered several potential novel regulators of CAD such as LUM and STAT3, which possess properties suitable as drug targets. We also revealed molecular relations and potential mechanisms through which the top CAD loci operate. Furthermore, we found that multiple CAD-relevant biological processes such as extracellular matrix, inflammatory and immune pathways, complement and coagulation cascades, and lipid metabolism interact in the CAD networks. Conclusions Our data-driven integrative genomics framework unraveled tissue-specific relations among the candidate genes of the CAD genome-wide association studies loci and prioritized novel network regulatory genes orchestrating biological processes relevant to CAD. PMID:26966275

  20. ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.

    PubMed

    Kamdar, Maulik R; Zeginis, Dimitris; Hasnain, Ali; Decker, Stefan; Deus, Helena F

    2014-02-01

    Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  1. [Transciptome among Mexicans: a large scale methodology to analyze the genetics expression profile of simultaneous samples in muscle, adipose tissue and lymphocytes obtained from the same individual].

    PubMed

    Bastarrachea, Raúl A; López-Alvarenga, Juan Carlos; Kent, Jack W; Laviada-Molina, Hugo A; Cerda-Flores, Ricardo M; Calderón-Garcidueñas, Ana Laura; Torres-Salazar, Amada; Torres-Salazar, Amanda; Nava-González, Edna J; Solis-Pérez, Elizabeth; Gallegos-Cabrales, Esther C; Cole, Shelley A; Comuzzie, Anthony G

    2008-01-01

    We describe the methodology used to analyze multiple transcripts using microarray techniques in simultaneous biopsies of muscle, adipose tissue and lymphocytes obtained from the same individual as part of the standard protocol of the Genetics of Metabolic Diseases in Mexico: GEMM Family Study. We recruited 4 healthy male subjects with BM1 20-41, who signed an informed consent letter. Subjects participated in a clinical examination that included anthropometric and body composition measurements, muscle biopsies (vastus lateralis) subcutaneous fat biopsies anda blood draw. All samples provided sufficient amplified RNA for microarray analysis. Total RNA was extracted from the biopsy samples and amplified for analysis. Of the 48,687 transcript targets queried, 39.4% were detectable in a least one of the studied tissues. Leptin was not detectable in lymphocytes, weakly expressed in muscle, but overexpressed and highly correlated with BMI in subcutaneous fat. Another example was GLUT4, which was detectable only in muscle and not correlated with BMI. Expression level concordance was 0.7 (p< 0.001) for the three tissues studied. We demonstrated the feasibility of carrying out simultaneous analysis of gene expression in multiple tissues, concordance of genetic expression in different tissues, and obtained confidence that this method corroborates the expected biological relationships among LEPand GLUT4. TheGEMM study will provide a broad and valuable overview on metabolic diseases, including obesity and type 2 diabetes.

  2. Investigation of Dracocephalum kostchyi plant extract on the effective inflammatory transcription factors and mediators in activated macrophages.

    PubMed

    Kalantar, Kurosh; Gholijani, Nasser; Mousaei, Nashmin; Yazdani, Malihe; Amirghofran, Zahra

    2018-06-07

    Dracocephalum kotschyi is traditionally used for its anti-inflammatory effects. We aimed to investigate the effects of ethyl acetate extract of D. kotschyi on the expression of key inflammatory mediators and main signaling molecules involved in regulation of inflammation. Lipopolysaccharide (LPS)-stimulated J774.1 mouse macrophages were cultured in the presence of the plant extract and examined by the real time-PCR for gene expressions of interleukin (IL)-1β, tumor necrosis factor (TNF)-α, inducible nitric oxide synthase (iNOS) and cyclooxygenase (COX)-2. Cytokine levels and phosphorylated forms of stress-activated protein kinases/c-Jun N-terminal kinase (SAPK/JNK), signal transducer and activator of transcription (STAT)-3, p38, IκB-α and nuclear factor (NF)-κB p65 were determined using ELISA. The extract significantly reduced the expression of key mediators of inflammation. iNOS expression level decreased from 138±8.5 fold in LPS-only treated cells to 6.5±2.6 fold after treatment with 25 ug/ml of the extract (p<0.001). Similarly, COX-2 expression decreased from 632 ±98.8 fold in control to 124 ±24.6 fold (p<0.01). Treatment of cells with the extract significantly reduced IL-1β and TNF-α cytokines at both gene and protein expression levels. The extract at 25 µg/ml caused significant decreases in phospho-SAPK/JNK and phospho-STAT3 levels in macrophages (p<0.01). Proteins of phospho-p38, NFκB-p65 and phospho-NF-κB p65 had a reduced levels in treated cells (p<0.05). No significant change in phospho-IĸB level was observed. These findings suggested that D. kotschyi with inhibition of NF-κB, SAPK/JNK, STAT-3 and p-38 might have reduced the expression levels of key inflammatory mediators and thus possibly have potential beneficial impact on inflammatory diseases. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  3. The barley EST DNA Replication and Repair Database (bEST-DRRD) as a tool for the identification of the genes involved in DNA replication and repair.

    PubMed

    Gruszka, Damian; Marzec, Marek; Szarejko, Iwona

    2012-06-14

    The high level of conservation of genes that regulate DNA replication and repair indicates that they may serve as a source of information on the origin and evolution of the species and makes them a reliable system for the identification of cross-species homologs. Studies that had been conducted to date shed light on the processes of DNA replication and repair in bacteria, yeast and mammals. However, there is still much to be learned about the process of DNA damage repair in plants. These studies, which were conducted mainly using bioinformatics tools, enabled the list of genes that participate in various pathways of DNA repair in Arabidopsis thaliana (L.) Heynh to be outlined; however, information regarding these mechanisms in crop plants is still very limited. A similar, functional approach is particularly difficult for a species whose complete genomic sequences are still unavailable. One of the solutions is to apply ESTs (Expressed Sequence Tags) as the basis for gene identification. For the construction of the barley EST DNA Replication and Repair Database (bEST-DRRD), presented here, the Arabidopsis nucleotide and protein sequences involved in DNA replication and repair were used to browse for and retrieve the deposited sequences, derived from four barley (Hordeum vulgare L.) sequence databases, including the "Barley Genome version 0.05" database (encompassing ca. 90% of barley coding sequences) and from two databases covering the complete genomes of two monocot models: Oryza sativa L. and Brachypodium distachyon L. in order to identify homologous genes. Sequences of the categorised Arabidopsis queries are used for browsing the repositories, which are located on the ViroBLAST platform. The bEST-DRRD is currently used in our project during the identification and validation of the barley genes involved in DNA repair. The presented database provides information about the Arabidopsis genes involved in DNA replication and repair, their expression patterns and models of protein interactions. It was designed and established to provide an open-access tool for the identification of monocot homologs of known Arabidopsis genes that are responsible for DNA-related processes. The barley genes identified in the project are currently being analysed to validate their function.

  4. STBase: one million species trees for comparative biology.

    PubMed

    McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.

  5. Do not hesitate to use Tversky-and other hints for successful active analogue searches with feature count descriptors.

    PubMed

    Horvath, Dragos; Marcou, Gilles; Varnek, Alexandre

    2013-07-22

    This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter α. Both initial "guesses" of α = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html , comparing favorably to Tanimoto in terms of "scaffold hopping" propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by "pro-query" Tversky scores at α > 0.5. Among simpler queries, one may distinguish between "growable" (allowing for active analogs with additional features), and a few "conservative" queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of "pro-candidate" biased Tversky scores at α < 0.5.

  6. A Focus on the Beneficial Effects of Alpha Synuclein and a Re-Appraisal of Synucleinopathies.

    PubMed

    Ryskalin, Larisa; Busceti, Carla L; Limanaqi, Fiona; Biagioni, Francesca; Gambardella, Stefano; Fornai, Francesco

    2018-01-01

    Alpha synuclein (α-syn) belongs to a class of proteins which are commonly considered to play a detrimental role in neuronal survival. This assumption is based on the occurrence of a severe neuronal degeneration in patients carrying a multiplication of the α-syn gene (SNCA) and in a variety of experimental models, where overexpression of α-syn leads to cell death and neurological impairment. In these conditions, a higher amount of normally structured α-syn produces a damage, which is even worse compared with that produced by α-syn owning an abnormal structure (as occurring following point gene mutations). In line with this, knocking out the expression of α-syn is reported to protect from specific neurotoxins such as 1-methyl, 4-phenyl 1,2,3,6-tetrahydropyridine (MPTP). In the present review we briefly discuss these well-known detrimental effects but we focus on findings showing that, in specific conditions α-syn is beneficial for cell survival. This occurs during methamphetamine intoxication which is counteracted by endogenous α-syn. Similarly, the dysfunction of the chaperone cysteine-string protein- alpha leads to cell pathology which is counteracted by over-expressing α-syn. In line with this, an increased expression of α-syn protects against oxidative damage produced by dopamine. Remarkably, when the lack of α-syn is combined with a depletion of β- and γ- synucleins, alterations in brain structure and function occur. This review tries to balance the evidence showing a beneficial effect with the bulk of data reporting a detrimental effect of endogenous α-syn. The specific role of α-syn as a chaperone protein is discussed to explain such a dual effect. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. vSPARQL: A View Definition Language for the Semantic Web

    PubMed Central

    Shaw, Marianne; Detwiler, Landon T.; Noy, Natalya; Brinkley, James; Suciu, Dan

    2010-01-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. PMID:20800106

  8. Solving the problem of Trans-Genomic Query with alignment tables.

    PubMed

    Parker, Douglass Stott; Hsiao, Ruey-Lung; Xing, Yi; Resch, Alissa M; Lee, Christopher J

    2008-01-01

    The trans-genomic query (TGQ) problem--enabling the free query of biological information, even across genomes--is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered. An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables ? tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships, and can represent any useful connection between sequence locations. They can be curated, and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations.Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location datatype that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables. This paper also reviews several successful large-scale applications of alignment tables for Trans-Genomic Query. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.

  9. A Role Calculus for ORM

    NASA Astrophysics Data System (ADS)

    Curland, Matthew; Halpin, Terry; Stirewalt, Kurt

    A conceptual schema of an information system specifies the fact structures of interest as well as related business rules that are either constraints or derivation rules. Constraints restrict the possible or permitted states or state transitions, while derivation rules enable some facts to be derived from others. Graphical languages are commonly used to specify conceptual schemas, but often need to be supplemented by more expressive textual languages to capture additional business rules, as well as conceptual queries that enable conceptual models to be queried directly. This paper describes research to provide a role calculus to underpin textual languages for Object-Role Modeling (ORM), to enable business rules and queries to be formulated in a language intelligible to business users. The role-based nature of this calculus, which exploits the attribute-free nature of ORM, appears to offer significant advantages over other proposed approaches, especially in the area of semantic stability.

  10. Transcription Factor Activities Enhance Markers of Drug Sensitivity in Cancer.

    PubMed

    Garcia-Alonso, Luz; Iorio, Francesco; Matchan, Angela; Fonseca, Nuno; Jaaks, Patricia; Peat, Gareth; Pignatelli, Miguel; Falcone, Fiammetta; Benes, Cyril H; Dunham, Ian; Bignell, Graham; McDade, Simon S; Garnett, Mathew J; Saez-Rodriguez, Julio

    2018-02-01

    Transcriptional dysregulation induced by aberrant transcription factors (TF) is a key feature of cancer, but its global influence on drug sensitivity has not been examined. Here, we infer the transcriptional activity of 127 TFs through analysis of RNA-seq gene expression data newly generated for 448 cancer cell lines, combined with publicly available datasets to survey a total of 1,056 cancer cell lines and 9,250 primary tumors. Predicted TF activities are supported by their agreement with independent shRNA essentiality profiles and homozygous gene deletions, and recapitulate mutant-specific mechanisms of transcriptional dysregulation in cancer. By analyzing cell line responses to 265 compounds, we uncovered numerous TFs whose activity interacts with anticancer drugs. Importantly, combining existing pharmacogenomic markers with TF activities often improves the stratification of cell lines in response to drug treatment. Our results, which can be queried freely at dorothea.opentargets.io, offer a broad foundation for discovering opportunities to refine personalized cancer therapies. Significance: Systematic analysis of transcriptional dysregulation in cancer cell lines and patient tumor specimens offers a publicly searchable foundation to discover new opportunities to refine personalized cancer therapies. Cancer Res; 78(3); 769-80. ©2017 AACR . ©2017 American Association for Cancer Research.

  11. Xenbase: Core features, data acquisition, and data processing.

    PubMed

    James-Zorn, Christina; Ponferrada, Virgillio G; Burns, Kevin A; Fortriede, Joshua D; Lotay, Vaneet S; Liu, Yu; Brad Karpinka, J; Karimi, Kamran; Zorn, Aaron M; Vize, Peter D

    2015-08-01

    Xenbase, the Xenopus model organism database (www.xenbase.org), is a cloud-based, web-accessible resource that integrates the diverse genomic and biological data from Xenopus research. Xenopus frogs are one of the major vertebrate animal models used for biomedical research, and Xenbase is the central repository for the enormous amount of data generated using this model tetrapod. The goal of Xenbase is to accelerate discovery by enabling investigators to make novel connections between molecular pathways in Xenopus and human disease. Our relational database and user-friendly interface make these data easy to query and allows investigators to quickly interrogate and link different data types in ways that would otherwise be difficult, time consuming, or impossible. Xenbase also enhances the value of these data through high-quality gene expression curation and data integration, by providing bioinformatics tools optimized for Xenopus experiments, and by linking Xenopus data to other model organisms and to human data. Xenbase draws in data via pipelines that download data, parse the content, and save them into appropriate files and database tables. Furthermore, Xenbase makes these data accessible to the broader biomedical community by continually providing annotated data updates to organizations such as NCBI, UniProtKB, and Ensembl. Here, we describe our bioinformatics, genome-browsing tools, data acquisition and sharing, our community submitted and literature curation pipelines, text-mining support, gene page features, and the curation of gene nomenclature and gene models. © 2015 Wiley Periodicals, Inc.

  12. Role of Arabidopsis ABF1/3/4 during det1 germination in salt and osmotic stress conditions.

    PubMed

    Fernando, V C Dilukshi; Al Khateeb, Wesam; Belmonte, Mark F; Schroeder, Dana F

    2018-05-01

    Arabidopsis det1 mutants exhibit salt and osmotic stress resistant germination. This phenotype requires HY5, ABF1, ABF3, and ABF4. While DE-ETIOLATED 1 (DET1) is well known as a negative regulator of light development, here we describe how det1 mutants also exhibit altered responses to salt and osmotic stress, specifically salt and mannitol resistant germination. LONG HYPOCOTYL 5 (HY5) positively regulates both light and abscisic acid (ABA) signalling. We found that hy5 suppressed the det1 salt and mannitol resistant germination phenotype, thus, det1 stress resistant germination requires HY5. We then queried publically available microarray datasets to identify genes downstream of HY5 that were differentially expressed in det1 mutants. Our analysis revealed that ABA regulated genes, including ABA RESPONSIVE ELEMENT BINDING FACTOR 3 (ABF3), are downregulated in det1 seedlings. We found that ABF3 is induced by salt in wildtype seeds, while homologues ABF4 and ABF1 are repressed, and all three genes are underexpressed in det1 seeds. We then investigated the role of ABF3, ABF4, and ABF1 in det1 phenotypes. Double mutant analysis showed that abf3, abf4, and abf1 all suppress the det1 salt/osmotic stress resistant germination phenotype. In addition, abf1 suppressed det1 rapid water loss and open stomata phenotypes. Thus interactions between ABF genes contribute to det1 salt/osmotic stress response phenotypes.

  13. High-Performance Computational Analysis of Glioblastoma Pathology Images with Database Support Identifies Molecular and Survival Correlates.

    PubMed

    Kong, Jun; Wang, Fusheng; Teodoro, George; Cooper, Lee; Moreno, Carlos S; Kurc, Tahsin; Pan, Tony; Saltz, Joel; Brat, Daniel

    2013-12-01

    In this paper, we present a novel framework for microscopic image analysis of nuclei, data management, and high performance computation to support translational research involving nuclear morphometry features, molecular data, and clinical outcomes. Our image analysis pipeline consists of nuclei segmentation and feature computation facilitated by high performance computing with coordinated execution in multi-core CPUs and Graphical Processor Units (GPUs). All data derived from image analysis are managed in a spatial relational database supporting highly efficient scientific queries. We applied our image analysis workflow to 159 glioblastomas (GBM) from The Cancer Genome Atlas dataset. With integrative studies, we found statistics of four specific nuclear features were significantly associated with patient survival. Additionally, we correlated nuclear features with molecular data and found interesting results that support pathologic domain knowledge. We found that Proneural subtype GBMs had the smallest mean of nuclear Eccentricity and the largest mean of nuclear Extent, and MinorAxisLength. We also found gene expressions of stem cell marker MYC and cell proliferation maker MKI67 were correlated with nuclear features. To complement and inform pathologists of relevant diagnostic features, we queried the most representative nuclear instances from each patient population based on genetic and transcriptional classes. Our results demonstrate that specific nuclear features carry prognostic significance and associations with transcriptional and genetic classes, highlighting the potential of high throughput pathology image analysis as a complementary approach to human-based review and translational research.

  14. A pro-invasive role for the Ca2+-activated K+ channel KCa3.1 in malignant glioma

    PubMed Central

    Turner, Kathryn L.; Honasoge, Avinash; Robert, Stephanie M.; McFerrin, Michael M.; Sontheimer, Harald

    2014-01-01

    Glioblastoma multiforme (GBM) are highly motile primary brain tumors. Diffuse tissue invasion hampers surgical resection leading to poor patient prognosis. Recent studies suggest that intracellular Ca2+ acts as a master regulator for cell motility and engages a number of downstream signals including Ca2+-activated ion channels. Querying the REepository of Molecular BRAin Neoplasia DaTa (REMBRANDT), an annotated patient gene database maintained by the National Cancer Institute, we identified the intermediate conductance Ca2+-activated K+ channels, KCa3.1, being overexpressed in 32% of glioma patients where protein expression significantly correlated with poor patient survival. To mechanistically link KCa3.1 expression to glioma invasion, we selected patient gliomas that, when propagated as xenolines in vivo, present with either high or low KCa3.1 expression. In addition we generated U251 glioma cells that stably express an inducible knockdown shRNA to experimentally eliminate KCa3.1 expression. Subjecting these cells to a combination of in vitro and in situ invasion assays, we demonstrate that KCa3.1 expression significantly enhances glioma invasion and that either specific pharmacological inhibition with TRAM-34 or elimination of the channel impairs invasion. Importantly, after intracranial implantation into SCID mice, ablation of KCa3.1 with inducible shRNA resulted in a significant reduction in tumor invasion into surrounding brain in vivo. These results show that KCa3.1 confers an invasive phenotype that significantly worsens a patient’s outlook, and suggests that KCa3.1 represents a viable therapeutic target to reduce glioma invasion. PMID:24585442

  15. Rembrandt: Helping Personalized Medicine Become a Reality Through Integrative Translational Research

    PubMed Central

    Madhavan, Subha; Zenklusen, Jean-Claude; Kotliarov, Yuri; Sahni, Himanso; Fine, Howard A.; Buetow, Kenneth

    2009-01-01

    Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set, and ability to integrate biomedical data from disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient’s tumor. Here we present Rembrandt, Repository of Molecular BRAin Neoplasia DaTa, a cancer clinical genomics database and a web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising nearly 566 gene expression arrays, 834 copy number arrays and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that REMBRANDT represents a prototype of how high throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic. PMID:19208739

  16. Aggregating Queries Against Large Inventories of Remotely Accessible Data

    NASA Astrophysics Data System (ADS)

    Gallagher, J. H. R.; Fulker, D. W.

    2016-12-01

    Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.

  17. Improving integrative searching of systems chemical biology data using semantic annotation.

    PubMed

    Chen, Bin; Ding, Ying; Wild, David J

    2012-03-08

    Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  18. gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

    PubMed

    Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

    2016-01-01

    Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).

  19. gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances

    PubMed Central

    Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

    2016-01-01

    Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos). PMID:27846272

  20. The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications

    PubMed Central

    Bare, J Christopher; Shannon, Paul T; Schmid, Amy K; Baliga, Nitin S

    2007-01-01

    Background Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. Results The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. Conclusion The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools. PMID:18021453

  1. The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications.

    PubMed

    Bare, J Christopher; Shannon, Paul T; Schmid, Amy K; Baliga, Nitin S

    2007-11-19

    Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools.

  2. SigReannot-mart: a query environment for expression microarray probe re-annotations.

    PubMed

    Moreews, François; Rauffet, Gaelle; Dehais, Patrice; Klopp, Christophe

    2011-01-01

    Expression microarrays are commonly used to study transcriptomes. Most of the arrays are now based on oligo-nucleotide probes. Probe design being a tedious task, it often takes place once at the beginning of the project. The oligo set is then used for several years. During this time period, the knowledge gathered by the community on the genome and the transcriptome increases and gets more precise. Therefore re-annotating the set is essential to supply the biologists with up-to-date annotations. SigReannot-mart is a query environment populated with regularly updated annotations for different oligo sets. It stores the results of the SigReannot pipeline that has mainly been used on farm and aquaculture species. It permits easy extraction in different formats using filters. It is used to compare probe sets on different criteria, to choose the set for a given experiment to mix probe sets in order to create a new one.

  3. Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

    PubMed

    Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael

    2016-07-19

    Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.

  4. GeneLab Analysis Working Group Kick-Off Meeting

    NASA Technical Reports Server (NTRS)

    Costes, Sylvain V.

    2018-01-01

    Goals to achieve for GeneLab AWG - GL vision - Review of GeneLab AWG charter Timeline and milestones for 2018 Logistics - Monthly Meeting - Workshop - Internship - ASGSR Introduction of team leads and goals of each group Introduction of all members Q/A Three-tier Client Strategy to Democratize Data Physiological changes, pathway enrichment, differential expression, normalization, processing metadata, reproducibility, Data federation/integration with heterogeneous bioinformatics external databases The GLDS currently serves over 100 omics investigations to the biomedical community via open access. In order to expand the scope of metadata record searches via the GLDS, we designed a metadata warehouse that collects and updates metadata records from external systems housing similar data. To demonstrate the capabilities of federated search and retrieval of these data, we imported metadata records from three open-access data systems into the GLDS metadata warehouse: NCBI's Gene Expression Omnibus (GEO), EBI's PRoteomics IDEntifications (PRIDE) repository, and the Metagenomics Analysis server (MG-RAST). Each of these systems defines metadata for omics data sets differently. One solution to bridge such differences is to employ a common object model (COM) to which each systems' representation of metadata can be mapped. Warehoused metadata records are then transformed at ETL to this single, common representation. Queries generated via the GLDS are then executed against the warehouse, and matching records are shown in the COM representation (Fig. 1). While this approach is relatively straightforward to implement, the volume of the data in the omics domain presents challenges in dealing with latency and currency of records. Furthermore, the lack of a coordinated has been federated data search for and retrieval of these kinds of data across other open-access systems, so that users are able to conduct biological meta-investigations using data from a variety of sources. Such meta-investigations are key to corroborating findings from many kinds of assays and translating them into systems biology knowledge and, eventually, therapeutics.

  5. Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference

    PubMed Central

    Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.

    2014-01-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology. PMID:24921649

  6. Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference.

    PubMed

    Chen, Guocai; Cairelli, Michael J; Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C

    2014-06-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.

  7. LAILAPS: the plant science search engine.

    PubMed

    Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias

    2015-01-01

    With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  8. Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining.

    PubMed

    Tang, Xiaoyu; Li, Jie; Millán-Aguiñaga, Natalie; Zhang, Jia Jia; O'Neill, Ellis C; Ugalde, Juan A; Jensen, Paul R; Mantovani, Simone M; Moore, Bradley S

    2015-12-18

    Recent genome sequencing efforts have led to the rapid accumulation of uncharacterized or "orphaned" secondary metabolic biosynthesis gene clusters (BGCs) in public databases. This increase in DNA-sequenced big data has given rise to significant challenges in the applied field of natural product genome mining, including (i) how to prioritize the characterization of orphan BGCs and (ii) how to rapidly connect genes to biosynthesized small molecules. Here, we show that by correlating putative antibiotic resistance genes that encode target-modified proteins with orphan BGCs, we predict the biological function of pathway specific small molecules before they have been revealed in a process we call target-directed genome mining. By querying the pan-genome of 86 Salinispora bacterial genomes for duplicated house-keeping genes colocalized with natural product BGCs, we prioritized an orphan polyketide synthase-nonribosomal peptide synthetase hybrid BGC (tlm) with a putative fatty acid synthase resistance gene. We employed a new synthetic double-stranded DNA-mediated cloning strategy based on transformation-associated recombination to efficiently capture tlm and the related ttm BGCs directly from genomic DNA and to heterologously express them in Streptomyces hosts. We show the production of a group of unusual thiotetronic acid natural products, including the well-known fatty acid synthase inhibitor thiolactomycin that was first described over 30 years ago, yet never at the genetic level in regards to biosynthesis and autoresistance. This finding not only validates the target-directed genome mining strategy for the discovery of antibiotic producing gene clusters without a priori knowledge of the molecule synthesized but also paves the way for the investigation of novel enzymology involved in thiotetronic acid natural product biosynthesis.

  9. Mechanism and Anticancer Activity of the Metabolites of an Endophytic Fungi from Eucommia ulmoides Oliv.

    PubMed

    Li, Qi; Zhang, Yan; Shi, Jun-Ling; Wang, Yi-Lin; Zhao, Hao-Bin; Shao, Dong-Yan; Huang, Qing-Sheng; Yang, Hui; Jin, Ming-Liang

    2017-01-01

    Backgroud: Pinoresinol (Pin) and pinoresinol monoglucoside (PMG) are plant-derived lignan molecules with multiple functions. We showed previously that an endophytic fungus from Eucommia ulmoides Oliv., Phomopsis sp. XP-8 is able to produce Pin and PMG. This study was carried out to test the anti-tumor capability of the culture of XP-8 and identify the major effective compounds. The fungal culture was added in the culture of HepG2 and K562 cells, and the viabilities of these cells were detected and the possible mechanism was analyzed. The fungal culture showed significant capaiblity in decreasing the viability of tumor cells and induce apoptosis via up-regulation of the expression of apoptosis-related genes. It also significantly inhibited the adhesion and migration of HepG2 cells by blocking MMP-9 expression. Pin and PMG were isolated from the growth culture and shown to be the major effective components for inhibition. The study indicated the potential application of XP-8 in the production of anti-tumour products by the bioconversion of glucose. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. © The Author(s) 2015. Published by Oxford University Press.

  11. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies

    PubMed Central

    Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431

  12. Web-TCGA: an online platform for integrated analysis of molecular cancer data sets.

    PubMed

    Deng, Mario; Brägelmann, Johannes; Schultze, Joachim L; Perner, Sven

    2016-02-06

    The Cancer Genome Atlas (TCGA) is a pool of molecular data sets publicly accessible and freely available to cancer researchers anywhere around the world. However, wide spread use is limited since an advanced knowledge of statistics and statistical software is required. In order to improve accessibility we created Web-TCGA, a web based, freely accessible online tool, which can also be run in a private instance, for integrated analysis of molecular cancer data sets provided by TCGA. In contrast to already available tools, Web-TCGA utilizes different methods for analysis and visualization of TCGA data, allowing users to generate global molecular profiles across different cancer entities simultaneously. In addition to global molecular profiles, Web-TCGA offers highly detailed gene and tumor entity centric analysis by providing interactive tables and views. As a supplement to other already available tools, such as cBioPortal (Sci Signal 6:pl1, 2013, Cancer Discov 2:401-4, 2012), Web-TCGA is offering an analysis service, which does not require any installation or configuration, for molecular data sets available at the TCGA. Individual processing requests (queries) are generated by the user for mutation, methylation, expression and copy number variation (CNV) analyses. The user can focus analyses on results from single genes and cancer entities or perform a global analysis (multiple cancer entities and genes simultaneously).

  13. GenoMetric Query Language: a novel approach to large-scale genomic data management.

    PubMed

    Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano

    2015-06-15

    Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer.

    PubMed

    Balaur, Irina; Saqi, Mansoor; Barat, Ana; Lysenko, Artem; Mazein, Alexander; Rawlings, Christopher J; Ruskin, Heather J; Auffray, Charles

    2017-10-01

    The development of colorectal cancer (CRC)-the third most common cancer type-has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression.

  15. Omicseq: a web-based search engine for exploring omics datasets

    PubMed Central

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  16. Transcriptome analysis of Pseudomonas syringae identifies new genes, ncRNAs, and antisense activity

    USDA-ARS?s Scientific Manuscript database

    To fully understand how bacteria respond to their environment, it is essential to assess genome-wide transcriptional activity. New high throughput sequencing technologies make it possible to query the transcriptome of an organism in an efficient unbiased manner. We applied a strand-specific method t...

  17. A novel role for APOBEC3: Susceptibility to sexual transmission of murine acquired immunodeficiency virus (mAIDS) is aggravated in APOBEC3 deficient mice

    PubMed Central

    2012-01-01

    Background APOBEC3 proteins are host factors that restrict infection by retroviruses like HIV, MMTV, and MLV and are variably expressed in hematopoietic and non-hematopoietic cells, such as macrophages, lymphocytes, dendritic, and epithelia cells. Previously, we showed that APOBEC3 expressed in mammary epithelia cells function to limit milk-borne transmission of the beta-retrovirus, mouse mammary tumor virus. In this present study, we used APOBEC3 knockout mice and their wild type counterpart to query the role of APOBEC3 in sexual transmission of LP-BM5 MLV – the etiological agent of murine AIDs (mAIDs). Results We show that mouse APOBEC3 is expressed in murine genital tract tissues and gametes and that genital tract tissue of APOBEC3-deficient mice are more susceptible to infection by LP-BM5 virus. APOBEC3 expressed in genital tract tissues most likely plays a role in decreasing virus transmission via the sexual route, since mice deficient in APOBEC3 gene have higher genitalia and seminal plasma virus load and sexually transmit the virus more efficiently to their partners compared to APOBEC3+ mice. Moreover, we show that female mice sexually infected with LP-BM5 virus transmit the virus to their off-spring in APOBEC3-dependent manner. Conclusion Our data indicate that genital tissue intrinsic APOBEC3 restricts genital tract infection and limits sexual transmission of LP-BM5 virus. PMID:22691411

  18. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

    PubMed Central

    Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet. PMID:28695067

  19. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

    PubMed

    Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.

  20. vSPARQL: a view definition language for the semantic web.

    PubMed

    Shaw, Marianne; Detwiler, Landon T; Noy, Natalya; Brinkley, James; Suciu, Dan

    2011-02-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. Copyright © 2010 Elsevier Inc. All rights reserved.

  1. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  2. Don’t Like RDF Reification? Making Statements about Statements Using Singleton Property

    PubMed Central

    Nguyen, Vinh; Bodenreider, Olivier; Sheth, Amit

    2015-01-01

    Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the querying and reasoning mechanisms to be aware of provenance, time, location, or certainty of triples. However, an efficient RDF representation for such meta knowledge of triples remains challenging. The existing standard reification approach allows such meta knowledge of RDF triples to be expressed using RDF by two steps. The first step is representing the triple by a Statement instance which has subject, predicate, and object indicated separately in three different triples. The second step is creating assertions about that instance as if it is a statement. While reification is simple and intuitive, this approach does not have formal semantics and is not commonly used in practice as described in the RDF Primer. In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it. We explain how this singleton property approach fits well with the existing syntax and formal semantics of RDF, and the syntax of SPARQL query language. We also demonstrate the use of singleton property in the representation and querying of meta knowledge in two examples of Semantic Web knowledge bases: YAGO2 and BKR. Our experiments on the BKR show that the singleton property approach gives a decent performance in terms of number of triples, query length and query execution time compared to existing approaches. This approach, which is also simple and intuitive, can be easily adopted for representing and querying statements about statements in other knowledge bases. PMID:25750938

  3. Cry-Bt identifier: a biological database for PCR detection of Cry genes present in transgenic plants.

    PubMed

    Singh, Vinay Kumar; Ambwani, Sonu; Marla, Soma; Kumar, Anil

    2009-10-23

    We describe the development of a user friendly tool that would assist in the retrieval of information relating to Cry genes in transgenic crops. The tool also helps in detection of transformed Cry genes from Bacillus thuringiensis present in transgenic plants by providing suitable designed primers for PCR identification of these genes. The tool designed based on relational database model enables easy retrieval of information from the database with simple user queries. The tool also enables users to access related information about Cry genes present in various databases by interacting with different sources (nucleotide sequences, protein sequence, sequence comparison tools, published literature, conserved domains, evolutionary and structural data). http://insilicogenomics.in/Cry-btIdentifier/welcome.html.

  4. Tumor mutational load and immune parameters across metastatic Renal Cell Carcinoma (mRCC) risk groups

    PubMed Central

    de Velasco, Guillermo; Miao, Diana; Voss, Martin H.; Hakimi, A. Ari; Hsieh, James J.; Tannir, Nizar M.; Tamboli, Pheroze; Appleman, Leonard J.; Rathmell, W. Kimryn; Van Allen, Eliezer M.; Choueiri, Toni K.

    2016-01-01

    Patients with metastatic renal cell carcinoma (mRCC) have better overall survival when treated with nivolumab, a cancer immunotherapy that targets the immune checkpoint inhibitor programmed cell death 1 (PD-1), rather than everolimus (a chemical inhibitor of mTOR and immunosuppressant). Poor-risk mRCC patients treated with nivolumab seemed to experience the greatest overall survival benefit, compared to patients with favorable or intermediate-risk, in an analysis of the CheckMate-025 trial subgroup of the Memorial Sloan Kettering Cancer Center (MSKCC) prognostic risk groups. Here we explore whether tumor mutational load and RNA expression of specific immune parameters could be segregated by prognostic MSKCC risk strata and explain the survival seen in the poor-risk group. We queried whole exome transcriptome data in RCC patients (n = 54) included in The Cancer Genome Atlas that ultimately developed metastatic disease or were diagnosed with metastatic disease at presentation and did not receive immune checkpoint inhibitors. Nonsynonymous mutational load did not differ significantly by MSKCC risk group, nor was the expression of cytolytic genes –granzyme A and perforin – or selected immune checkpoint molecules different across MSKCC risk groups. In conclusion, this analysis found that mutational load and expression of markers of an active tumor microenvironment did not correlate with MSKCC risk prognostic classification in mRCC. PMID:27538576

  5. Recombinant Expression, Functional Characterization of Two Scorpion Venom Toxins with Three Disulfide Bridges from the Chinese Scorpion Buthus martensii Karsch.

    PubMed

    Lin, Shengguo; Wang, Xuelin; Hu, Xueyao; Zhao, Yongshan; Zhao, Mingyi; Zhang, Jinghai; Cui, Yong

    2017-01-01

    Scorpion venom contains a large variety of biologically active peptides. However, most of these peptides have not been identified and characterized. Peptides with three disulfide bridges, existing in the scorpion venom, have not been studied in detail and have been poorly characterized until now. Here, we report the recombinant expression and functional characterization of two kinds of venom peptides (BmKBTx and BmNaL-3SS2) with three disulfide bridges. This study adopted an effective Escherichia coli system. The genes for BmKBTx and BmNaL-3SS2 were obtained by polymerase chain reaction and cloned to the pSYPU-1b vector. After expression and purification, the two recombinant proteins were subjected to an analgesic activity assay in mice and whole-cell patchclamp recording of hNav1.7-CHO cell lines. Functional tests showed that BmKBTx and BmNaL- 3SS2 have analgesic activity in mice and can interact with the hNav1.7 subtype of the voltage-gated sodium channel (VGSC). Scorpion venom is rich in bioactive proteins, but most of their functions are unknown to us. This study has increased our knowledge of these novel disulfide-bridged peptides (DBPs) and their biological activities. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  6. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  7. Identification and functional analyses of sex determination genes in the sexually dimorphic stag beetle Cyclommatus metallifer.

    PubMed

    Gotoh, Hiroki; Zinna, Robert A; Warren, Ian; DeNieu, Michael; Niimi, Teruyuki; Dworkin, Ian; Emlen, Douglas J; Miura, Toru; Lavine, Laura C

    2016-03-22

    Genes in the sex determination pathway are important regulators of sexually dimorphic animal traits, including the elaborate and exaggerated male ornaments and weapons of sexual selection. In this study, we identified and functionally analyzed members of the sex determination gene family in the golden metallic stag beetle Cyclommatus metallifer, which exhibits extreme differences in mandible size between males and females. We constructed a C. metallifer transcriptomic database from larval and prepupal developmental stages and tissues of both males and females. Using Roche 454 pyrosequencing, we generated a de novo assembled database from a total of 1,223,516 raw reads, which resulted in 14,565 isotigs (putative transcript isoforms) contained in 10,794 isogroups (putative identified genes). We queried this database for C. metallifer conserved sex determination genes and identified 14 candidate sex determination pathway genes. We then characterized the roles of several of these genes in development of extreme sexual dimorphic traits in this species. We performed molecular expression analyses with RT-PCR and functional analyses using RNAi on three C. metallifer candidate genes--Sex-lethal (CmSxl), transformer-2 (Cmtra2), and intersex (Cmix). No differences in expression pattern were found between the sexes for any of these three genes. In the RNAi gene-knockdown experiments, we found that only the Cmix had any effect on sexually dimorphic morphology, and these mimicked the effects of Cmdsx knockdown in females. Knockdown of CmSxl had no measurable effects on stag beetle phenotype, while knockdown of Cmtra2 resulted in complete lethality at the prepupal period. These results indicate that the roles of CmSxl and Cmtra2 in the sex determination cascade are likely to have diverged in stag beetles when compared to Drosophila. Our results also suggest that Cmix has a conserved role in this pathway. In addition to those three genes, we also performed a more complete functional analysis of the C. metallifer dsx gene (Cmdsx) to identify the isoforms that regulate dimorphism more fully using exon-specific RNAi. We identified a total of 16 alternative splice variants of the Cmdsx gene that code for up to 14 separate exons. Despite the variation in RNA splice products of the Cmdsx gene, only four protein isoforms are predicted. The results of our exon-specific RNAi indicated that the essential CmDsx isoform for postembryonic male differentiation is CmDsxB, whereas postembryonic female specific differentiation is mainly regulated by CmDsxD. Taken together, our results highlight the importance of studying the function of highly conserved sex determination pathways in numerous insect species, especially those with dramatic and exaggerated sexual dimorphism, because conservation in protein structure does not always translate into conservation in downstream function.

  8. Logic-Based Retrieval: Technology for Content-Oriented and Analytical Querying of Patent Data

    NASA Astrophysics Data System (ADS)

    Klampanos, Iraklis Angelos; Wu, Hengzhi; Roelleke, Thomas; Azzam, Hany

    Patent searching is a complex retrieval task. An initial document search is only the starting point of a chain of searches and decisions that need to be made by patent searchers. Keyword-based retrieval is adequate for document searching, but it is not suitable for modelling comprehensive retrieval strategies. DB-like and logical approaches are the state-of-the-art techniques to model strategies, reasoning and decision making. In this paper we present the application of logical retrieval to patent searching. The two grand challenges are expressiveness and scalability, where high degree of expressiveness usually means a loss in scalability. In this paper we report how to maintain scalability while offering the expressiveness of logical retrieval required for solving patent search tasks. We present logical retrieval background, and how to model data-source selection and results' fusion. Moreover, we demonstrate the modelling of a retrieval strategy, a technique by which patent professionals are able to express, store and exchange their strategies and rationales when searching patents or when making decisions. An overview of the architecture and technical details complement the paper, while the evaluation reports preliminary results on how query processing times can be guaranteed, and how quality is affected by trading off responsiveness.

  9. UniGene Tabulator: a full parser for the UniGene format.

    PubMed

    Lenzi, Luca; Frabetti, Flavia; Facchin, Federica; Casadei, Raffaella; Vitale, Lorenza; Canaider, Silvia; Carinci, Paolo; Zannotti, Maria; Strippoli, Pierluigi

    2006-10-15

    UniGene Tabulator 1.0 provides a solution for full parsing of UniGene flat file format; it implements a structured graphical representation of each data field present in UniGene following import into a common database managing system usable in a personal computer. This database includes related tables for sequence, protein similarity, sequence-tagged site (STS) and transcript map interval (TXMAP) data, plus a summary table where each record represents a UniGene cluster. UniGene Tabulator enables full local management of UniGene data, allowing parsing, querying, indexing, retrieving, exporting and analysis of UniGene data in a relational database form, usable on Macintosh (OS X 10.3.9 or later) and Windows (2000, with service pack 4, XP, with service pack 2 or later) operating systems-based computers. The current release, including both the FileMaker runtime applications, is freely available at http://apollo11.isto.unibo.it/software/

  10. Gene and protein nomenclature in public databases

    PubMed Central

    Fundel, Katrin; Zimmer, Ralf

    2006-01-01

    Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134

  11. MitoMiner: a data warehouse for mitochondrial proteomics data

    PubMed Central

    Smith, Anthony C.; Blackshaw, James A.; Robinson, Alan J.

    2012-01-01

    MitoMiner (http://mitominer.mrc-mbu.cam.ac.uk/) is a data warehouse for the storage and analysis of mitochondrial proteomics data gathered from publications of mass spectrometry and green fluorescent protein tagging studies. In MitoMiner, these data are integrated with data from UniProt, Gene Ontology, Online Mendelian Inheritance in Man, HomoloGene, Kyoto Encyclopaedia of Genes and Genomes and PubMed. The latest release of MitoMiner stores proteomics data sets from 46 studies covering 11 different species from eumetazoa, viridiplantae, fungi and protista. MitoMiner is implemented by using the open source InterMine data warehouse system, which provides a user interface allowing users to upload data for analysis, personal accounts to store queries and results and enables queries of any data in the data model. MitoMiner also provides lists of proteins for use in analyses, including the new MitoMiner mitochondrial proteome reference sets that specify proteins with substantial experimental evidence for mitochondrial localization. As further mitochondrial proteomics data sets from normal and diseased tissue are published, MitoMiner can be used to characterize the variability of the mitochondrial proteome between tissues and investigate how changes in the proteome may contribute to mitochondrial dysfunction and mitochondrial-associated diseases such as cancer, neurodegenerative diseases, obesity, diabetes, heart failure and the ageing process. PMID:22121219

  12. Automated Identification of Medically Important Bacteria by 16S rRNA Gene Sequencing Using a Novel Comprehensive Database, 16SpathDB▿

    PubMed Central

    Woo, Patrick C. Y.; Teng, Jade L. L.; Yeung, Juilian M. Y.; Tse, Herman; Lau, Susanna K. P.; Yuen, Kwok-Yung

    2011-01-01

    Despite the increasing use of 16S rRNA gene sequencing, interpretation of 16S rRNA gene sequence results is one of the most difficult problems faced by clinical microbiologists and technicians. To overcome the problems we encountered in the existing databases during 16S rRNA gene sequence interpretation, we built a comprehensive database, 16SpathDB (http://147.8.74.24/16SpathDB) based on the 16S rRNA gene sequences of all medically important bacteria listed in the Manual of Clinical Microbiology and evaluated its use for automated identification of these bacteria. Among 91 nonduplicated bacterial isolates collected in our clinical microbiology laboratory, 71 (78%) were reported by 16SpathDB as a single bacterial species having >98.0% nucleotide identity with the query sequence, 19 (20.9%) were reported as more than one bacterial species having >98.0% nucleotide identity with the query sequence, and 1 (1.1%) was reported as no match. For the 71 bacterial isolates reported as a single bacterial species, all results were identical to their true identities as determined by a polyphasic approach. For the 19 bacterial isolates reported as more than one bacterial species, all results contained their true identities as determined by a polyphasic approach and all of them had their true identities as the “best match in 16SpathDB.” For the isolate (Gordonibacter pamelaeae) reported as no match, the bacterium has never been reported to be associated with human disease and was not included in the Manual of Clinical Microbiology. 16SpathDB is an automated, user-friendly, efficient, accurate, and regularly updated database for 16S rRNA gene sequence interpretation in clinical microbiology laboratories. PMID:21389154

  13. ChemiRs: a web application for microRNAs and chemicals.

    PubMed

    Su, Emily Chia-Yu; Chen, Yu-Sing; Tien, Yun-Cheng; Liu, Jeff; Ho, Bing-Ching; Yu, Sung-Liang; Singh, Sher

    2016-04-18

    MicroRNAs (miRNAs) are about 22 nucleotides, non-coding RNAs that affect various cellular functions, and play a regulatory role in different organisms including human. Until now, more than 2500 mature miRNAs in human have been discovered and registered, but still lack of information or algorithms to reveal the relations among miRNAs, environmental chemicals and human health. Chemicals in environment affect our health and daily life, and some of them can lead to diseases by inferring biological pathways. We develop a creditable online web server, ChemiRs, for predicting interactions and relations among miRNAs, chemicals and pathways. The database not only compares gene lists affected by chemicals and miRNAs, but also incorporates curated pathways to identify possible interactions. Here, we manually retrieved associations of miRNAs and chemicals from biomedical literature. We developed an online system, ChemiRs, which contains miRNAs, diseases, Medical Subject Heading (MeSH) terms, chemicals, genes, pathways and PubMed IDs. We connected each miRNA to miRBase, and every current gene symbol to HUGO Gene Nomenclature Committee (HGNC) for genome annotation. Human pathway information is also provided from KEGG and REACTOME databases. Information about Gene Ontology (GO) is queried from GO Online SQL Environment (GOOSE). With a user-friendly interface, the web application is easy to use. Multiple query results can be easily integrated and exported as report documents in PDF format. Association analysis of miRNAs and chemicals can help us understand the pathogenesis of chemical components. ChemiRs is freely available for public use at http://omics.biol.ntnu.edu.tw/ChemiRs .

  14. UCSC genome browser: deep support for molecular biomedical research.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

    2008-01-01

    The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).

  15. Deep mRNA Sequencing of the Tritonia diomedea Brain Transcriptome Provides Access to Gene Homologues for Neuronal Excitability, Synaptic Transmission and Peptidergic Signalling

    PubMed Central

    Senatore, Adriano; Edirisinghe, Neranjan; Katz, Paul S.

    2015-01-01

    Background The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia), has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level. Results We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes). BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis) revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA) produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA. Conclusions Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain. PMID:25719197

  16. Genotator: a disease-agnostic tool for genetic annotation of disease.

    PubMed

    Wall, Dennis P; Pivovarov, Rimma; Tong, Mark; Jung, Jae-Yoon; Fusaro, Vincent A; DeLuca, Todd F; Tonellato, Peter J

    2010-10-29

    Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone. We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at http://genotator.hms.harvard.edu. Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease. As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.

  17. A high-resolution anatomical ontology of the developing murine genitourinary tract

    PubMed Central

    Little, Melissa H.; Brennan, Jane; Georgas, Kylie; Davies, Jamie A.; Davidson, Duncan R.; Baldock, Richard A.; Beverdam, Annemiek; Bertram, John F.; Capel, Blanche; Chiu, Han Sheng; Clements, Dave; Cullen-McEwen, Luise; Fleming, Jean; Gilbert, Thierry; Houghton, Derek; Kaufman, Matt H.; Kleymenova, Elena; Koopman, Peter A.; Lewis, Alfor G.; McMahon, Andrew P.; Mendelsohn, Cathy L.; Mitchell, Eleanor K.; Rumballe, Bree A.; Sweeney, Derina E.; Valerius, M. Todd; Yamada, Gen; Yang, Yiya; Yu., Jing

    2007-01-01

    Cataloguing gene expression during development of the genitourinary tract will increase our understanding not only of this process but also of congenital defects and disease affecting this organ system. We have developed a high-resolution ontology with which to describe the subcompartments of the developing murine genitourinary tract. This ontology incorporates what can be defined histologically and begins to encompass other structures and cell types already identified at the molecular level. The ontology is being used to annotate in situ hybridisation data generated as part of the Genitourinary Development Molecular Anatomy Project (GUDMAP), a publicly available data resource on gene and protein expression during genitourinary development. The GUDMAP ontology encompasses Theiler stage (TS) 17 to 27 of development as well as the sexually mature adult. It has been written as a partonomic, text-based, hierarchical ontology that, for the embryological stages, has been developed as a high-resolution expansion of the existing Edinburgh Mouse Atlas Project (EMAP) ontology. It also includes group terms for well-characterised structural and/or functional units comprising several sub-structures, such as the nephron and juxtaglomerular complex. Each term has been assigned a unique identification number. Synonyms have been used to improve the success of query searching and maintain wherever possible existing EMAP terms relating to this organ system. We describe here the principles and structure of the ontology and provide representative diagrammatic, histological, and whole mount and section RNA in situ hybridisation images to clarify the terms used within the ontology. Visual examples of how terms appear in different specimen types are also provided. PMID:17452023

  18. Transcriptome Profiling of Khat (Catha edulis) and Ephedra sinica Reveals Gene Candidates Potentially Involved in Amphetamine-Type Alkaloid Biosynthesis

    PubMed Central

    Groves, Ryan A.; Hagel, Jillian M.; Zhang, Ye; Kilpatrick, Korey; Levy, Asaf; Marsolais, Frédéric; Lewinsohn, Efraim; Sensen, Christoph W.; Facchini, Peter J.

    2015-01-01

    Amphetamine analogues are produced by plants in the genus Ephedra and by khat (Catha edulis), and include the widely used decongestants and appetite suppressants (1S,2S)-pseudoephedrine and (1R,2S)-ephedrine. The production of these metabolites, which derive from L-phenylalanine, involves a multi-step pathway partially mapped out at the biochemical level using knowledge of benzoic acid metabolism established in other plants, and direct evidence using khat and Ephedra species as model systems. Despite the commercial importance of amphetamine-type alkaloids, only a single step in their biosynthesis has been elucidated at the molecular level. We have employed Illumina next-generation sequencing technology, paired with Trinity and Velvet-Oases assembly platforms, to establish data-mining frameworks for Ephedra sinica and khat plants. Sequence libraries representing a combined 200,000 unigenes were subjected to an annotation pipeline involving direct searches against public databases. Annotations included the assignment of Gene Ontology (GO) terms used to allocate unigenes to functional categories. As part of our functional genomics program aimed at novel gene discovery, the databases were mined for enzyme candidates putatively involved in alkaloid biosynthesis. Queries used for mining included enzymes with established roles in benzoic acid metabolism, as well as enzymes catalyzing reactions similar to those predicted for amphetamine alkaloid metabolism. Gene candidates were evaluated based on phylogenetic relationships, FPKM-based expression data, and mechanistic considerations. Establishment of expansive sequence resources is a critical step toward pathway characterization, a goal with both academic and industrial implications. PMID:25806807

  19. Independence between pre-mRNA splicing and DNA methylation in an isogenic minigene resource.

    PubMed

    Nanan, Kyster K; Ocheltree, Cody; Sturgill, David; Mandler, Mariana D; Prigge, Maria; Varma, Garima; Oberdoerffer, Shalini

    2017-12-15

    Actively transcribed genes adopt a unique chromatin environment with characteristic patterns of enrichment. Within gene bodies, H3K36me3 and cytosine DNA methylation are elevated at exons of spliced genes and have been implicated in the regulation of pre-mRNA splicing. H3K36me3 is further responsive to splicing, wherein splicing inhibition led to a redistribution and general reduction over gene bodies. In contrast, little is known of the mechanisms supporting elevated DNA methylation at actively spliced genic locations. Recent evidence associating the de novo DNA methyltransferase Dnmt3b with H3K36me3-rich chromatin raises the possibility that genic DNA methylation is influenced by splicing-associated H3K36me3. Here, we report the generation of an isogenic resource to test the direct impact of splicing on chromatin. A panel of minigenes of varying splicing potential were integrated into a single FRT site for inducible expression. Profiling of H3K36me3 confirmed the established relationship to splicing, wherein levels were directly correlated with splicing efficiency. In contrast, DNA methylation was equivalently detected across the minigene panel, irrespective of splicing and H3K36me3 status. In addition to revealing a degree of independence between genic H3K36me3 and DNA methylation, these findings highlight the generated minigene panel as a flexible platform for the query of splicing-dependent chromatin modifications. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  20. CellMiner Companion: an interactive web application to explore CellMiner NCI-60 data.

    PubMed

    Wang, Sufang; Gribskov, Michael; Hazbun, Tony R; Pascuzzi, Pete E

    2016-08-01

    The NCI-60 human tumor cell line panel is an invaluable resource for cancer researchers, providing drug sensitivity, molecular and phenotypic data for a range of cancer types. CellMiner is a web resource that provides tools for the acquisition and analysis of quality-controlled NCI-60 data. CellMiner supports queries of up to 150 drugs or genes, but the output is an Excel file for each drug or gene. This output format makes it difficult for researchers to explore the data from large queries. CellMiner Companion is a web application that facilitates the exploration and visualization of output from CellMiner, further increasing the accessibility of NCI-60 data. The web application is freely accessible at https://pul-bioinformatics.shinyapps.io/CellMinerCompanion The R source code can be downloaded at https://github.com/pepascuzzi/CellMinerCompanion.git ppascuzz@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Population-specific documentation of pharmacogenomic markers and their allelic frequencies in FINDbase.

    PubMed

    Georgitsi, Marianthi; Viennas, Emmanouil; Gkantouna, Vassiliki; Christodoulopoulou, Elena; Zagoriti, Zoi; Tafrali, Christina; Ntellos, Fotios; Giannakopoulou, Olga; Boulakou, Athanassia; Vlahopoulou, Panagiota; Kyriacou, Eva; Tsaknakis, John; Tsakalidis, Athanassios; Poulas, Konstantinos; Tzimas, Giannis; Patrinos, George P

    2011-01-01

    Population and ethnic group-specific allele frequencies of pharmacogenomic markers are poorly documented and not systematically collected in structured data repositories. We developed the Frequency of Inherited Disorders Pharmacogenomics database (FINDbase-PGx), a separate module of the FINDbase, aiming to systematically document pharmacogenomic allele frequencies in various populations and ethnic groups worldwide. We critically collected and curated 214 scientific articles reporting pharmacogenomic markers allele frequencies in various populations and ethnic groups worldwide. Subsequently, in order to host the curated data, support data visualization and data mining, we developed a website application, utilizing Microsoft™ PivotViewer software. Curated allelic frequency data pertaining to 144 pharmacogenomic markers across 14 genes, representing approximately 87,000 individuals from 150 populations worldwide, are currently included in FINDbase-PGx. A user-friendly query interface allows for easy data querying, based on numerous content criteria, such as population, ethnic group, geographical region, gene, drug and rare allele frequency. FINDbase-PGx is a comprehensive database, which, unlike other pharmacogenomic knowledgebases, fulfills the much needed requirement to systematically document pharmacogenomic allelic frequencies in various populations and ethnic groups worldwide.

  2. A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors

    PubMed Central

    Jahchan, Nadine S; Dudley, Joel T; Mazur, Pawel K; Flores, Natasha; Yang, Dian; Palmerton, Alec; Zmoos, Anne-Flore; Vaka, Dedeepya; Tran, Kim QT; Zhou, Margaret; Krasinska, Karolina; Riess, Jonathan W; Neal, Joel W; Khatri, Purvesh; Park, Kwon S; Butte, Atul J; Sage, Julien

    2013-01-01

    Small cell lung cancer (SCLC) is an aggressive neuroendocrine subtype of lung cancer with high mortality. We used a systematic drug-repositioning bioinformatics approach querying a large compendium of gene expression profiles to identify candidate FDA-approved drugs to treat SCLC. We found that tricyclic antidepressants and related molecules potently induce apoptosis in both chemonaïve and chemoresistant SCLC cells in culture, in mouse and human SCLC tumors transplanted into immunocompromised mice, and in endogenous tumors from a mouse model for human SCLC. The candidate drugs activate stress pathways and induce cell death in SCLC cells, at least in part by disrupting autocrine survival signals involving neurotransmitters and their G protein-coupled receptors. The candidate drugs inhibit the growth of other neuroendocrine tumors, including pancreatic neuroendocrine tumors and Merkel cell carcinoma. These experiments identify novel targeted strategies that can be rapidly evaluated in patients with neuroendocrine tumors through the repurposing of approved drugs. PMID:24078773

  3. An integrated bioinformatics analysis to dissect kinase dependency in triple negative breast cancer.

    PubMed

    Ryall, Karen A; Kim, Jihye; Klauck, Peter J; Shin, Jimin; Yoo, Minjae; Ionkina, Anastasia; Pitts, Todd M; Tentler, John J; Diamond, Jennifer R; Eckhardt, S Gail; Heasley, Lynn E; Kang, Jaewoo; Tan, Aik Choon

    2015-01-01

    Triple-Negative Breast Cancer (TNBC) is an aggressive disease with a poor prognosis. Clinically, TNBC patients have limited treatment options besides chemotherapy. The goal of this study was to determine the kinase dependency in TNBC cell lines and to predict compounds that could inhibit these kinases using integrative bioinformatics analysis. We integrated publicly available gene expression data, high-throughput pharmacological profiling data, and quantitative in vitro kinase binding data to determine the kinase dependency in 12 TNBC cell lines. We employed Kinase Addiction Ranker (KAR), a novel bioinformatics approach, which integrated these data sources to dissect kinase dependency in TNBC cell lines. We then used the kinase dependency predicted by KAR for each TNBC cell line to query K-Map for compounds targeting these kinases. We validated our predictions using published and new experimental data. In summary, we implemented an integrative bioinformatics analysis that determines kinase dependency in TNBC. Our analysis revealed candidate kinases as potential targets in TNBC for further pharmacological and biological studies.

  4. Human motion retrieval from hand-drawn sketch.

    PubMed

    Chao, Min-Wen; Lin, Chao-Hung; Assa, Jackie; Lee, Tong-Yee

    2012-05-01

    The rapid growth of motion capture data increases the importance of motion retrieval. The majority of the existing motion retrieval approaches are based on a labor-intensive step in which the user browses and selects a desired query motion clip from the large motion clip database. In this work, a novel sketching interface for defining the query is presented. This simple approach allows users to define the required motion by sketching several motion strokes over a drawn character, which requires less effort and extends the users’ expressiveness. To support the real-time interface, a specialized encoding of the motions and the hand-drawn query is required. Here, we introduce a novel hierarchical encoding scheme based on a set of orthonormal spherical harmonic (SH) basis functions, which provides a compact representation, and avoids the CPU/processing intensive stage of temporal alignment used by previous solutions. Experimental results show that the proposed approach can well retrieve the motions, and is capable of retrieve logically and numerically similar motions, which is superior to previous approaches. The user study shows that the proposed system can be a useful tool to input motion query if the users are familiar with it. Finally, an application of generating a 3D animation from a hand-drawn comics strip is demonstrated.

  5. Designing integrated computational biology pipelines visually.

    PubMed

    Jamil, Hasan M

    2013-01-01

    The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.

  6. Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

    PubMed

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-08-07

    There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.

  7. Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

    PubMed Central

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-01-01

    Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127

  8. Influenza-like illness surveillance on Twitter through automated learning of naïve language.

    PubMed

    Gesualdo, Francesco; Stilo, Giovanni; Agricola, Eleonora; Gonfiantini, Michaela V; Pandolfi, Elisabetta; Velardi, Paola; Tozzi, Alberto E

    2013-01-01

    Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

  9. Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language

    PubMed Central

    Gesualdo, Francesco; Stilo, Giovanni; Agricola, Eleonora; Gonfiantini, Michaela V.; Pandolfi, Elisabetta; Velardi, Paola; Tozzi, Alberto E.

    2013-01-01

    Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems. PMID:24324799

  10. Omicseq: a web-based search engine for exploring omics datasets.

    PubMed

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. The lexeme hypotheses: Their use to generate highly grammatical and completely computerized medical records.

    PubMed

    Macfarlane, Donald

    2016-07-01

    Medical records often contain free text created by harried clinicians. Free text often contains errors which make it an unsuitable target for computerized data extraction. The cost of healthcare can be reduced by creating medical records that are fully computerized at their inception. We examine hypotheses that enable us to construct such records. We regard the text of the medical record as being an ordered collection of meaningful fragments. The intellectual content (or "lexeme") of each text fragment in the record is considered separately from the language that used to express it. We further consider that each lexeme exists as a combination of a lexeme query (defining the issue being addressed) and a lexeme response to that query. The medical record can then be perceived as a stream of these responses. The responses can be expressed in any style or language, including computer code. Examining medical records in this light gives rise to a number of observations and hypotheses. The physical location and nature of the medical episode (which we term "context") determines the general layout of the record. The order that lexeme-queries are addressed in within the record is highly consistent ("coherence"). Issues are only addressed if they are logically called-for by the context or by a previously-selected lexeme response ("predicance"), and only to a needed depth of detail ("level"). We hypothesize that all of the lexeme queries required to write any clinical notes can be stored in a large database ("lexicon") in coherence order, wherein each lexeme query is associated with its own collection of lexeme responses. We hypothesize that the issue a note-writer will need to address next is identifiable purely by using the rules of coherence, level and predicance. We have tested these hypotheses with a computer program which repeatedly offers the user a menu of lexeme responses with associated text. On selection, the program issues the text fragment, and its corresponding computer code, to output files. The program then uses coherence, predicance and level to navigate to the next appropriate lexeme query for presentation to the user. The net result is that the user creates a grammatically correct and completely computerized note at the time of its inception. The value of this approach and its practical implementation to create medical records are discussed. In our work so far, the hypotheses appear not to be false, but further testing is needed using a larger lexicon to establish their robustness in actual clinical practice. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. STBase: One Million Species Trees for Comparative Biology

    PubMed Central

    McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees. PMID:25679219

  13. Luciferase reporter assay in Drosophila and mammalian tissue culture cells

    PubMed Central

    Yun, Chi

    2014-01-01

    Luciferase reporter gene assays are one of the most common methods for monitoring gene activity. Because of their sensitivity, dynamic range, and lack of endogenous activity, luciferase assays have been particularly useful for functional genomics in cell-based assays, such as RNAi screening. This unit describes delivery of two luciferase reporters with other nucleic acids (siRNA /dsRNA), measurement of the dual luciferase activities, and analysis of data generated. The systematic query of gene function (RNAi) combined with the advances in luminescent technology have made it possible to design powerful whole genome screens to address diverse and significant biological questions. PMID:24652620

  14. DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

    PubMed Central

    Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

    2009-01-01

    Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes. PMID:19534755

  15. Version VI of the ESTree db: an improved tool for peach transcriptome analysis

    PubMed Central

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Merelli, Ivan; Barale, Francesca; Milanesi, Luciano; Stella, Alessandra; Pozzi, Carlo

    2008-01-01

    Background The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. Results A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. Conclusions The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention. PMID:18387211

  16. Implementation of relational data base management systems on micro-computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, C.L.

    1982-01-01

    This dissertation describes an implementation of a Relational Data Base Management System on a microcomputer. A specific floppy disk based hardward called TERAK is being used, and high level query interface which is similar to a subset of the SEQUEL language is provided. The system contains sub-systems such as I/O, file management, virtual memory management, query system, B-tree management, scanner, command interpreter, expression compiler, garbage collection, linked list manipulation, disk space management, etc. The software has been implemented to fulfill the following goals: (1) it is highly modularized. (2) The system is physically segmented into 16 logically independent, overlayable segments,more » in a way such that a minimal amount of memory is needed at execution time. (3) Virtual memory system is simulated that provides the system with seemingly unlimited memory space. (4) A language translator is applied to recognize user requests in the query language. The code generation of this translator generates compact code for the execution of UPDATE, DELETE, and QUERY commands. (5) A complete set of basic functions needed for on-line data base manipulations is provided through the use of a friendly query interface. (6) To eliminate the dependency on the environment (both software and hardware) as much as possible, so that it would be easy to transplant the system to other computers. (7) To simulate each relation as a sequential file. It is intended to be a highly efficient, single user system suited to be used by small or medium sized organizations for, say, administrative purposes. Experiments show that quite satisfying results have indeed been achieved.« less

  17. Prediction of the Ebola Virus Infection Related Human Genes Using Protein-Protein Interaction Network.

    PubMed

    Cao, HuanHuan; Zhang, YuHang; Zhao, Jia; Zhu, Liucun; Wang, Yi; Li, JiaRui; Feng, Yuan-Ming; Zhang, Ning

    2017-01-01

    Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. According to the "guilt by association" (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. The complete validated mitochondrial genome of the silver gemfish Rexea solandri (Cuvier, 1832) (Perciformes, Gempylidae).

    PubMed

    Bustamante, Carlos; Ovenden, Jennifer R

    2016-01-01

    The silver gemfish Rexea solandri is an important economic resource but Vulnerable to overfishing in Australian waters. The complete mitochondrial genome sequence is described from 1.6 million reads obtained via next generation sequencing. The total length of the mitogenome is 16,350 bp comprising 2 rRNA, 13 protein-coding genes, 22 tRNA and 2 non-coding regions. The mitogenome sequence was validated against sequences of PCR fragments and BLAST queries of Genbank. Gene order was equivalent to that found in marine fishes.

  19. PubNet: a flexible system for visualizing literature derived networks

    PubMed Central

    Douglas, Shawn M; Montelione, Gaetano T; Gerstein, Mark

    2005-01-01

    We have developed PubNet, a web-based tool that extracts several types of relationships returned by PubMed queries and maps them into networks, allowing for graphical visualization, textual navigation, and topological analysis. PubNet supports the creation of complex networks derived from the contents of individual citations, such as genes, proteins, Protein Data Bank (PDB) IDs, Medical Subject Headings (MeSH) terms, and authors. This feature allows one to, for example, examine a literature derived network of genes based on functional similarity. PMID:16168087

  20. The Eimeria Transcript DB: an integrated resource for annotated transcripts of protozoan parasites of the genus Eimeria

    PubMed Central

    Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur

    2013-01-01

    Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718

  1. Genomes as geography: using GIS technology to build interactive genome feature maps

    PubMed Central

    Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J

    2006-01-01

    Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652

  2. Dysregulation of the mitogen granulin in human cancer through the miR-15/107 microRNA gene group

    PubMed Central

    Wang, Wang-Xia; Kyprianou, Natasha; Wang, Xiaowei; Nelson, Peter T.

    2010-01-01

    Granulin (GRN) is a potent mitogen and growth factor implicated in many human cancers, but its regulation is poorly understood. Recent findings indicate that GRN is regulated strongly by the microRNA miR-107, which functionally overlap with miR-15, miR-16, and miR-195 due to a common 5' sequence critical for target specificity. In this study, we queried whether miR-107 and paralogs regulated GRN in human cancers. In cultured cells, anti-Argonaute RIP-ChIP experiments indicate that GRN mRNA is directly targeted by numerous miR-15/107 miRNAs. Further tests of this association in human tumors. MiR-15 and miR-16 are known to be downregulated in chronic lymphocytic leukemia (CLL). Using pre-existing microarray datasets, we found that GRN expression is higher in CLL relative to non-neoplastic lymphocytes (P>0.00001). By contrast, other prospective miR-15/miR-16 targets in the dataset (BCL-2 and cyclin D1) were not up-regulated in CLL. Unlike in CLL, GRN was not up-regulated in chronic myelogenous leukemia (CML) where miR-107 paralogs are not known to be dysregulated. Prior studies have shown that GRN is also up-regulated, and miR-107 down-regulated, in prostate carcinoma. Our results indicate that multiple members of the miR-107 gene group indeed repress GRN protein levels when transfected into prostate cancer cells. At least a dozen distinct types of cancer have the pattern of increased GRN and decreased miR-107 expression. These findings indicate for the first time that the mitogen and growth factor GRN is dysregulated via the miR-15/107 gene group in multiple human cancers, which may provide a potential common therapeutic target. PMID:20884628

  3. E = Mc(super 2) for the Chemist: When is Mass Conserved?

    ERIC Educational Resources Information Center

    Treptow. Richard S.

    2005-01-01

    An equation derived by Albert Einstein in 1905 that expresses a relationship between mass and energy, formulated as E = mc(super 2) is discussed with reference to the extent mass is conserved. This query can be used to challenge students and develop their language and critical thinking skills.

  4. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  6. Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips.

    PubMed

    Ferreira, Nivan; Poco, Jorge; Vo, Huy T; Freire, Juliana; Silva, Cláudio T

    2013-12-01

    As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this paper, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with taxi trips can provide unprecedented insight into many different aspects of city life, from economic activity and human behavior to mobility patterns. But analyzing these data presents many challenges. The data are complex, containing geographical and temporal components in addition to multiple variables associated with each trip. Consequently, it is hard to specify exploratory queries and to perform comparative analyses (e.g., compare different regions over time). This problem is compounded due to the size of the data-there are on average 500,000 taxi trips each day in NYC. We propose a new model that allows users to visually query taxi trips. Besides standard analytics queries, the model supports origin-destination queries that enable the study of mobility across the city. We show that this model is able to express a wide range of spatio-temporal queries, and it is also flexible in that not only can queries be composed but also different aggregations and visual representations can be applied, allowing users to explore and compare results. We have built a scalable system that implements this model which supports interactive response times; makes use of an adaptive level-of-detail rendering strategy to generate clutter-free visualization for large results; and shows hidden details to the users in a summary through the use of overlay heat maps. We present a series of case studies motivated by traffic engineers and economists that show how our model and system enable domain experts to perform tasks that were previously unattainable for them.

  7. BGDB: a database of bivalent genes.

    PubMed

    Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua

    2013-01-01

    Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/

  8. Tumor Mutational Load and Immune Parameters across Metastatic Renal Cell Carcinoma Risk Groups.

    PubMed

    de Velasco, Guillermo; Miao, Diana; Voss, Martin H; Hakimi, A Ari; Hsieh, James J; Tannir, Nizar M; Tamboli, Pheroze; Appleman, Leonard J; Rathmell, W Kimryn; Van Allen, Eliezer M; Choueiri, Toni K

    2016-10-01

    Patients with metastatic renal cell carcinoma (mRCC) have better overall survival when treated with nivolumab, a cancer immunotherapy that targets the immune checkpoint inhibitor programmed cell death 1 (PD-1), rather than everolimus (a chemical inhibitor of mTOR and immunosuppressant). Poor-risk mRCC patients treated with nivolumab seemed to experience the greatest overall survival benefit, compared with patients with favorable or intermediate risk, in an analysis of the CheckMate-025 trial subgroup of the Memorial Sloan Kettering Cancer Center (MSKCC) prognostic risk groups. Here, we explore whether tumor mutational load and RNA expression of specific immune parameters could be segregated by prognostic MSKCC risk strata and explain the survival seen in the poor-risk group. We queried whole-exome transcriptome data in renal cell carcinoma patients (n = 54) included in The Cancer Genome Atlas who ultimately developed metastatic disease or were diagnosed with metastatic disease at presentation and did not receive immune checkpoint inhibitors. Nonsynonymous mutational load did not differ significantly by the MSKCC risk group, nor was the expression of cytolytic genes-granzyme A and perforin-or selected immune checkpoint molecules different across MSKCC risk groups. In conclusion, this analysis revealed that mutational load and expression of markers of an active tumor microenvironment did not correlate with MSKCC risk prognostic classification in mRCC. Cancer Immunol Res; 4(10); 820-2. ©2016 AACR. ©2016 American Association for Cancer Research.

  9. Uterine Wound Healing: A Complex Process Mediated by Proteins and Peptides.

    PubMed

    Lofrumento, Dario D; Di Nardo, Maria A; De Falco, Marianna; Di Lieto, Andrea

    2017-01-01

    Wound healing is the process by which a complex cascade of biochemical events is responsible of the repair the damage. In vivo, studies in humans and mice suggest that healing and post-healing heterogeneous behavior of the surgically wounded myometrium is both phenotype and genotype dependent. Uterine wound healing process involves many cells: endothelial cells, neutrophils, monocytes/macrophages, lymphocytes, fibroblasts, myometrial cells as well a stem cell population found in the myometrium, myoSP (side population of myometrial cells). Transforming growth factor beta (TGF-β) isoforms, connective tissue growth factor (CTGF), basic fibroblast growth factor (bFGF), platelet-derived growth factor (PDGF), vascular endothelial growth factor (VEGF), and tumor necrosis factor alpha (TNF-β) are involved in the wound healing mechanisms. The increased TGF- β1/β3 ratio reduces scarring and fibrosis. The CTGF altered expression may be a factor involved in the abnormal scars formation of low uterine segment after cesarean section and of the formation of uterine dehiscence. The lack of bFGF is involved in the reduction of collagen deposition in the wound site and thicker scabs. The altered expression of TNF-β, VEGF, and PDGF in human myometrial smooth muscle cells in case of uterine dehiscence, it is implicated in the uterine healing process. The over-and under-expressions of growth factors genes involved in uterine scarring process could represent patient's specific features, increasing the risk of cesarean scar complications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    PubMed Central

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379

  11. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  12. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  13. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    PubMed

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  14. An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.

    PubMed

    Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D

    2013-01-01

    Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.

  15. Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences

    PubMed Central

    Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi

    2006-01-01

    Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384

  16. DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases

    NASA Astrophysics Data System (ADS)

    Bröcheler, Matthias; Pugliese, Andrea; Subrahmanian, V. S.

    RDF is an increasingly important paradigm for the representation of information on the Web. As RDF databases increase in size to approach tens of millions of triples, and as sophisticated graph matching queries expressible in languages like SPARQL become increasingly important, scalability becomes an issue. To date, there is no graph-based indexing method for RDF data where the index was designed in a way that makes it disk-resident. There is therefore a growing need for indexes that can operate efficiently when the index itself resides on disk. In this paper, we first propose the DOGMA index for fast subgraph matching on disk and then develop a basic algorithm to answer queries over this index. This algorithm is then significantly sped up via an optimized algorithm that uses efficient (but correct) pruning strategies when combined with two different extensions of the index. We have implemented a preliminary system and tested it against four existing RDF database systems developed by others. Our experiments show that our algorithm performs very well compared to these systems, with orders of magnitude improvements for complex graph queries.

  17. Combining semantic technologies with a content-based image retrieval system - Preliminary considerations

    NASA Astrophysics Data System (ADS)

    Chmiel, P.; Ganzha, M.; Jaworska, T.; Paprzycki, M.

    2017-10-01

    Nowadays, as a part of systematic growth of volume, and variety, of information that can be found on the Internet, we observe also dramatic increase in sizes of available image collections. There are many ways to help users browsing / selecting images of interest. One of popular approaches are Content-Based Image Retrieval (CBIR) systems, which allow users to search for images that match their interests, expressed in the form of images (query by example). However, we believe that image search and retrieval could take advantage of semantic technologies. We have decided to test this hypothesis. Specifically, on the basis of knowledge captured in the CBIR, we have developed a domain ontology of residential real estate (detached houses, in particular). This allows us to semantically represent each image (and its constitutive architectural elements) represented within the CBIR. The proposed ontology was extended to capture not only the elements resulting from image segmentation, but also "spatial relations" between them. As a result, a new approach to querying the image database (semantic querying) has materialized, thus extending capabilities of the developed system.

  18. The clinical trial landscape in oncology and connectivity of somatic mutational profiles to targeted therapies.

    PubMed

    Patterson, Sara E; Liu, Rangjiao; Statz, Cara M; Durkin, Daniel; Lakshminarayana, Anuradha; Mockus, Susan M

    2016-01-16

    Precision medicine in oncology relies on rapid associations between patient-specific variations and targeted therapeutic efficacy. Due to the advancement of genomic analysis, a vast literature characterizing cancer-associated molecular aberrations and relative therapeutic relevance has been published. However, data are not uniformly reported or readily available, and accessing relevant information in a clinically acceptable time-frame is a daunting proposition, hampering connections between patients and appropriate therapeutic options. One important therapeutic avenue for oncology patients is through clinical trials. Accordingly, a global view into the availability of targeted clinical trials would provide insight into strengths and weaknesses and potentially enable research focus. However, data regarding the landscape of clinical trials in oncology is not readily available, and as a result, a comprehensive understanding of clinical trial availability is difficult. To support clinical decision-making, we have developed a data loader and mapper that connects sequence information from oncology patients to data stored in an in-house database, the JAX Clinical Knowledgebase (JAX-CKB), which can be queried readily to access comprehensive data for clinical reporting via customized reporting queries. JAX-CKB functions as a repository to house expertly curated clinically relevant data surrounding our 358-gene panel, the JAX Cancer Treatment Profile (JAX CTP), and supports annotation of functional significance of molecular variants. Through queries of data housed in JAX-CKB, we have analyzed the landscape of clinical trials relevant to our 358-gene targeted sequencing panel to evaluate strengths and weaknesses in current molecular targeting in oncology. Through this analysis, we have identified patient indications, molecular aberrations, and targeted therapy classes that have strong or weak representation in clinical trials. Here, we describe the development and disseminate system methods for associating patient genomic sequence data with clinically relevant information, facilitating interpretation and providing a mechanism for informing therapeutic decision-making. Additionally, through customized queries, we have the capability to rapidly analyze the landscape of targeted therapies in clinical trials, enabling a unique view into current therapeutic availability in oncology.

  19. Alterations in osteopontin modify muscle size in females in both humans and mice.

    PubMed

    Hoffman, Eric P; Gordish-Dressman, Heather; McLane, Virginia D; Devaney, Joseph M; Thompson, Paul D; Visich, Paul; Gordon, Paul M; Pescatello, Linda S; Zoeller, Robert F; Moyna, Niall M; Angelopoulos, Theodore J; Pegoraro, Elena; Cox, Gregory A; Clarkson, Priscilla M

    2013-06-01

    An osteopontin (OPN; SPP1) gene promoter polymorphism modifies disease severity in Duchenne muscular dystrophy, and we hypothesized that it might also modify muscle phenotypes in healthy volunteers. Gene association studies were carried out for OPN (rs28357094) in the FAMuSS cohort (n = 752; mean ± SD age = 23.7 ± 5.7 yr). The phenotypes studied included muscle size (MRI), strength, and response to supervised resistance training. We also studied 147 young adults that had carried out a bout of eccentric elbow exercise (age = 24.0 ± 5.2 yr). Phenotypes analyzed included strength, soreness, and serum muscle enzymes. In the FAMuSS cohort, the G allele was associated with 17% increase in baseline upper arm muscle volume only in women (F = 26.32; P = 5.32 × 10), explaining 5% of population variance. In the eccentric damage cohort, weak associations of the G allele were seen in women with both baseline myoglobin and elevated creatine kinase. The sexually dimorphic effects of OPN on muscle were also seen in OPN-null mice. Five of seven muscle groups examined showed smaller size in OPN-null female mice, whereas two were smaller in male mice. The query of OPN gene transcription after experimental muscle damage in mice showed rapid induction within 12 h (100-fold increase from baseline), followed by sustained high-level expression through 16 d of regeneration before falling to back to baseline. OPN is a sexually dimorphic modifier of muscle size in normal humans and mice and responds to muscle damage. The OPN gene is known to be estrogen responsive, and this may explain the female-specific genotype effects in adult volunteers.

  20. An interactive web application for the dissemination of human systems immunology data.

    PubMed

    Speake, Cate; Presnell, Scott; Domico, Kelly; Zeitner, Brad; Bjork, Anna; Anderson, David; Mason, Michael J; Whalen, Elizabeth; Vargas, Olivia; Popov, Dimitry; Rinchai, Darawan; Jourde-Chiche, Noemie; Chiche, Laurent; Quinn, Charlie; Chaussabel, Damien

    2015-06-19

    Systems immunology approaches have proven invaluable in translational research settings. The current rate at which large-scale datasets are generated presents unique challenges and opportunities. Mining aggregates of these datasets could accelerate the pace of discovery, but new solutions are needed to integrate the heterogeneous data types with the contextual information that is necessary for interpretation. In addition, enabling tools and technologies facilitating investigators' interaction with large-scale datasets must be developed in order to promote insight and foster knowledge discovery. State of the art application programming was employed to develop an interactive web application for browsing and visualizing large and complex datasets. A collection of human immune transcriptome datasets were loaded alongside contextual information about the samples. We provide a resource enabling interactive query and navigation of transcriptome datasets relevant to human immunology research. Detailed information about studies and samples are displayed dynamically; if desired the associated data can be downloaded. Custom interactive visualizations of the data can be shared via email or social media. This application can be used to browse context-rich systems-scale data within and across systems immunology studies. This resource is publicly available online at [Gene Expression Browser Landing Page ( https://gxb.benaroyaresearch.org/dm3/landing.gsp )]. The source code is also available openly [Gene Expression Browser Source Code ( https://github.com/BenaroyaResearch/gxbrowser )]. We have developed a data browsing and visualization application capable of navigating increasingly large and complex datasets generated in the context of immunological studies. This intuitive tool ensures that, whether taken individually or as a whole, such datasets generated at great effort and expense remain interpretable and a ready source of insight for years to come.

  1. ESTuber db: an online database for Tuber borchii EST sequences.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

    2007-03-08

    The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.

  2. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.

    PubMed

    Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin

    2010-04-16

    Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.

  3. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

    PubMed Central

    2010-01-01

    Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504

  4. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  5. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  6. YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface

    PubMed Central

    Abdulrehman, Dário; Monteiro, Pedro Tiago; Teixeira, Miguel Cacho; Mira, Nuno Pereira; Lourenço, Artur Bastos; dos Santos, Sandra Costa; Cabrito, Tânia Rodrigues; Francisco, Alexandre Paulo; Madeira, Sara Cordeiro; Aires, Ricardo Santos; Oliveira, Arlindo Limede; Sá-Correia, Isabel; Freitas, Ana Teresa

    2011-01-01

    The YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT) information system (http://www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2010, this database contains over 48 200 regulatory associations between transcription factors (TFs) and target genes, including 298 specific DNA-binding sites for 110 characterized TFs. All regulatory associations stored in the database were revisited and detailed information on the experimental evidences that sustain those associations was added and classified as direct or indirect evidences. The inclusion of this new data, gathered in response to the requests of YEASTRACT users, allows the user to restrict its queries to subsets of the data based on the existence or not of experimental evidences for the direct action of the TFs in the promoter region of their target genes. Another new feature of this release is the availability of all data through a machine readable web-service interface. Users are no longer restricted to the set of available queries made available through the existing web interface, and can use the web service interface to query, retrieve and exploit the YEASTRACT data using their own implementation of additional functionalities. The YEASTRACT information system is further complemented with several computational tools that facilitate the use of the curated data when answering a number of important biological questions. Since its first release in 2006, YEASTRACT has been extensively used by hundreds of researchers from all over the world. We expect that by making the new data and services available, the system will continue to be instrumental for yeast biologists and systems biology researchers. PMID:20972212

  7. eXframe: reusable framework for storage, analysis and visualization of genomics experiments

    PubMed Central

    2011-01-01

    Background Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types. Results We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1) the ability to publish structured data compliant with accepted standards 2) support for multiple data types including microarrays and next generation sequencing 3) query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples) and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients. Conclusion The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own useful modifications. PMID:22103807

  8. Involvement of PI3K, Akt, and RhoA in oestradiol regulation of cardiac iNOS expression.

    PubMed

    Zafirovic, Sonja; Sudar-Milovanovic, Emina; Obradovic, Milan; Djordjevic, Jelena; Jasnic, Nebojsa; Borovic, Milica Labudovic; Isenovic, Esma R

    2018-02-12

    Oestradiol is an important regulatory factor with several positive effects on the cardiovascular (CV) system. We evaluated the molecular mechanism of the in vivo effects of oestradiol on the regulation of cardiac inducible nitric oxide (NO) synthase (iNOS) expression and activity. Male Wistar rats were treated with oestradiol (40 mg/kg, intraperitoneally) and after 24 h the animals were sacrificed. The concentrations of NO and L-Arginine (L-Arg) were determined spectrophotometrically. For protein expressions of iNOS, p65 subunit of nuclear factor-κB (NFκB-p65), Ras homolog gene family-member A (RhoA), angiotensin II receptor type 1 (AT1R), insulin receptor substrate 1 (IRS-1), p85, p110 and protein kinase B (Akt), Western blot method was used. Co-immunoprecipitation was used for measuring the association of IRS-1 with the p85 subunit of phosphatidylinositol-3-kinase (PI3K). The expression of iNOS messenger ribonucleic acid (mRNA) was measured with the quantitative real-time polymerase chain reaction (qRT-PCR). Immunohistochemical analysis of the tissue was used to detect localization and expression of iNOS in heart tissue. Oestradiol treatment reduced L-Arg concentration (p<0.01), iNOS mRNA (p<0.01) and protein (p<0.001) expression, level of RhoA (p<0.05) and AT1R (p<0.001) protein. In contrast, plasma NO (p<0.05), Akt phosphorylation at Thr308 (p<0.05) and protein level of p85 (p<0.001) increased after oestradiol treatment. Our results suggest that oestradiol in vivo regulates cardiac iNOS expression via the PI3K/Akt signaling pathway, through attenuation of RhoA and AT1R. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. RaMP: A Comprehensive Relational Database of Metabolomics Pathways for Pathway Enrichment Analysis of Genes and Metabolites

    PubMed Central

    Zhang, Bofei; Hu, Senyang; Baskin, Elizabeth; Patt, Andrew; Siddiqui, Jalal K.

    2018-01-01

    The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly. PMID:29470400

  10. RaMP: A Comprehensive Relational Database of Metabolomics Pathways for Pathway Enrichment Analysis of Genes and Metabolites.

    PubMed

    Zhang, Bofei; Hu, Senyang; Baskin, Elizabeth; Patt, Andrew; Siddiqui, Jalal K; Mathé, Ewy A

    2018-02-22

    The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly.

  11. Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs.

    PubMed

    White, Ryen W; Horvitz, Eric

    2017-03-01

    A statistical model that predicts the appearance of strong evidence of a lung carcinoma diagnosis via analysis of large-scale anonymized logs of web search queries from millions of people across the United States. To evaluate the feasibility of screening patients at risk of lung carcinoma via analysis of signals from online search activity. We identified people who issue special queries that provide strong evidence of a recent diagnosis of lung carcinoma. We then considered patterns of symptoms expressed as searches about concerning symptoms over several months prior to the appearance of the landmark web queries. We built statistical classifiers that predict the future appearance of landmark queries based on the search log signals. This was a retrospective log analysis of the online activity of millions of web searchers seeking health-related information online. Of web searchers who queried for symptoms related to lung carcinoma, some (n = 5443 of 4 813 985) later issued queries that provide strong evidence of recent clinical diagnosis of lung carcinoma and are regarded as positive cases in our analysis. Additional evidence on the reliability of these queries as representing clinical diagnoses is based on the significant increase in follow-on searches for treatments and medications for these searchers and on the correlation between lung carcinoma incidence rates and our log-based statistics. The remaining symptom searchers (n = 4 808 542) are regarded as negative cases. Performance of the statistical model for early detection from online search behavior, for different lead times, different sets of signals, and different cohorts of searchers stratified by potential risk. The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms. Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.

  12. CuGene as a tool to view and explore genomic data

    NASA Astrophysics Data System (ADS)

    Haponiuk, Michał; Pawełkowicz, Magdalena; Przybecki, Zbigniew; Nowak, Robert M.

    2017-08-01

    Integrated CuGene is an easy-to-use, open-source, on-line tool that can be used to browse, analyze, and query genomic data and annotations. It places annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. It also allows users to upload and display their own experimental results or annotation sets. An important functionality of the application is a possibility to find similarity between sequences by applying four different algorithms of different accuracy. The presented tool was tested on real genomic data and is extensively used by Polish Consortium of Cucumber Genome Sequencing.

  13. A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs.

    PubMed

    Cameron, Delroy; Sheth, Amit P; Jaykumar, Nishita; Thirunarayan, Krishnaprasad; Anand, Gaurish; Smith, Gary A

    2014-12-01

    While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and "intelligible constructs" not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems.

  14. A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs

    PubMed Central

    Cameron, Delroy; Sheth, Amit P.; Jaykumar, Nishita; Thirunarayan, Krishnaprasad; Anand, Gaurish; Smith, Gary A.

    2015-01-01

    While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and “intelligible constructs” not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems. PMID:25814917

  15. Genomic insights into the evolution of hybrid isoprenoid biosynthetic gene clusters in the MAR4 marine streptomycete clade

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gallagher, Kelley A.; Jensen, Paul R.

    Background: Considerable advances have been made in our understanding of the molecular genetics of secondary metabolite biosynthesis. Coupled with increased access to genome sequence data, new insight can be gained into the diversity and distributions of secondary metabolite biosynthetic gene clusters and the evolutionary processes that generate them. Here we examine the distribution of gene clusters predicted to encode the biosynthesis of a structurally diverse class of molecules called hybrid isoprenoids (HIs) in the genus Streptomyces. These compounds are derived from a mixed biosynthetic origin that is characterized by the incorporation of a terpene moiety onto a variety of chemicalmore » scaffolds and include many potent antibiotic and cytotoxic agents. Results: One hundred and twenty Streptomyces genomes were searched for HI biosynthetic gene clusters using ABBA prenyltransferases (PTases) as queries. These enzymes are responsible for a key step in HI biosynthesis. The strains included 12 that belong to the ‘MAR4’ clade, a largely marine-derived lineage linked to the production of diverse HI secondary metabolites. We found ABBA PTase homologs in all of the MAR4 genomes, which averaged five copies per strain, compared with 21 % of the non-MAR4 genomes, which averaged one copy per strain. Phylogenetic analyses suggest that MAR4 PTase diversity has arisen by a combination of horizontal gene transfer and gene duplication. Furthermore, there is evidence that HI gene cluster diversity is generated by the horizontal exchange of orthologous PTases among clusters. Many putative HI gene clusters have not been linked to their secondary metabolic products, suggesting that MAR4 strains will yield additional new compounds in this structure class. Finally, we confirm that the mevalonate pathway is not always present in genomes that contain HI gene clusters and thus is not a reliable query for identifying strains with the potential to produce HI secondary metabolites. In conclusion: We found that marine-derived MAR4 streptomycetes possess a relatively high genetic potential for HI biosynthesis. The combination of horizontal gene transfer, duplication, and rearrangement indicate that complex evolutionary processes account for the high level of HI gene cluster diversity in these bacteria, the products of which may provide a yet to be defined adaptation to the marine environment.« less

  16. Genomic insights into the evolution of hybrid isoprenoid biosynthetic gene clusters in the MAR4 marine streptomycete clade

    DOE PAGES

    Gallagher, Kelley A.; Jensen, Paul R.

    2015-11-17

    Background: Considerable advances have been made in our understanding of the molecular genetics of secondary metabolite biosynthesis. Coupled with increased access to genome sequence data, new insight can be gained into the diversity and distributions of secondary metabolite biosynthetic gene clusters and the evolutionary processes that generate them. Here we examine the distribution of gene clusters predicted to encode the biosynthesis of a structurally diverse class of molecules called hybrid isoprenoids (HIs) in the genus Streptomyces. These compounds are derived from a mixed biosynthetic origin that is characterized by the incorporation of a terpene moiety onto a variety of chemicalmore » scaffolds and include many potent antibiotic and cytotoxic agents. Results: One hundred and twenty Streptomyces genomes were searched for HI biosynthetic gene clusters using ABBA prenyltransferases (PTases) as queries. These enzymes are responsible for a key step in HI biosynthesis. The strains included 12 that belong to the ‘MAR4’ clade, a largely marine-derived lineage linked to the production of diverse HI secondary metabolites. We found ABBA PTase homologs in all of the MAR4 genomes, which averaged five copies per strain, compared with 21 % of the non-MAR4 genomes, which averaged one copy per strain. Phylogenetic analyses suggest that MAR4 PTase diversity has arisen by a combination of horizontal gene transfer and gene duplication. Furthermore, there is evidence that HI gene cluster diversity is generated by the horizontal exchange of orthologous PTases among clusters. Many putative HI gene clusters have not been linked to their secondary metabolic products, suggesting that MAR4 strains will yield additional new compounds in this structure class. Finally, we confirm that the mevalonate pathway is not always present in genomes that contain HI gene clusters and thus is not a reliable query for identifying strains with the potential to produce HI secondary metabolites. In conclusion: We found that marine-derived MAR4 streptomycetes possess a relatively high genetic potential for HI biosynthesis. The combination of horizontal gene transfer, duplication, and rearrangement indicate that complex evolutionary processes account for the high level of HI gene cluster diversity in these bacteria, the products of which may provide a yet to be defined adaptation to the marine environment.« less

  17. Identification of related gene/protein names based on an HMM of name variations.

    PubMed

    Yeganova, L; Smith, L; Wilbur, W J

    2004-04-01

    Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.

  18. Neighboring Genes Show Correlated Evolution in Gene Expression

    PubMed Central

    Ghanbarian, Avazeh T.; Hurst, Laurence D.

    2015-01-01

    When considering the evolution of a gene’s expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking. PMID:25743543

  19. A literature search tool for intelligent extraction of disease-associated genes.

    PubMed

    Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P

    2014-01-01

    To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.

  20. Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis

    PubMed Central

    Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun

    2013-01-01

    The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867

Top