Sample records for comparative toxicogenomics database

  1. The Comparative Toxicogenomics Database: update 2017.

    PubMed

    Davis, Allan Peter; Grondin, Cynthia J; Johnson, Robin J; Sciaky, Daniela; King, Benjamin L; McMorran, Roy; Wiegers, Jolene; Wiegers, Thomas C; Mattingly, Carolyn J

    2017-01-04

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) provides information about interactions between chemicals and gene products, and their relationships to diseases. Core CTD content (chemical-gene, chemical-disease and gene-disease interactions manually curated from the literature) are integrated with each other as well as with select external datasets to generate expanded networks and predict novel associations. Today, core CTD includes more than 30.5 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, Gene Ontology (GO) annotations, pathways, and gene interaction modules. In this update, we report a 33% increase in our core data content since 2015, describe our new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduce a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies). These advancements centralize and contextualize real-world chemical exposures with molecular pathways to help scientists generate testable hypotheses in an effort to understand the etiology and mechanisms underlying environmentally influenced diseases. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. The Comparative Toxicogenomics Database (CTD): A Resource for Comparative Toxicological Studies

    PubMed Central

    CJ, Mattingly; MC, Rosenstein; GT, Colby; JN, Forrest; JL, Boyer

    2006-01-01

    The etiology of most chronic diseases involves interactions between environmental factors and genes that modulate important biological processes (Olden and Wilson, 2000). We are developing the publicly available Comparative Toxicogenomics Database (CTD) to promote understanding about the effects of environmental chemicals on human health. CTD identifies interactions between chemicals and genes and facilitates cross-species comparative studies of these genes. The use of diverse animal models and cross-species comparative sequence studies has been critical for understanding basic physiological mechanisms and gene and protein functions. Similarly, these approaches will be valuable for exploring the molecular mechanisms of action of environmental chemicals and the genetic basis of differential susceptibility. PMID:16902965

  3. Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.

    PubMed

    Vishnyakova, Dina; Pasche, Emilie; Ruch, Patrick

    2012-01-01

    We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.

  4. A DATABASE FOR TRACKING REPRODUCTIVE TOXICOGENOMIC DATA

    EPA Science Inventory

    A Database for Tracking Reproductive Toxicogenomic Data
    Wenjun Bao, Judy Schmid, Amber Goetz, Hongzu Ren and David Dix
    Reproductive Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Pr...

  5. Release of (and lessons learned from mining) a pioneering large toxicogenomics database.

    PubMed

    Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R

    2015-07-01

    We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.

  6. Workshop report: Identifying opportunities for global integration of toxicogenomics databases, 26-27 June 2013, Research Triangle Park, NC, USA.

    PubMed

    Hendrickx, Diana M; Boyles, Rebecca R; Kleinjans, Jos C S; Dearry, Allen

    2014-12-01

    A joint US-EU workshop on enhancing data sharing and exchange in toxicogenomics was held at the National Institute for Environmental Health Sciences. Currently, efficient reuse of data is hampered by problems related to public data availability, data quality, database interoperability (the ability to exchange information), standardization and sustainability. At the workshop, experts from universities and research institutes presented databases, studies, organizations and tools that attempt to deal with these problems. Furthermore, a case study showing that combining toxicogenomics data from multiple resources leads to more accurate predictions in risk assessment was presented. All participants agreed that there is a need for a web portal describing the diverse, heterogeneous data resources relevant for toxicogenomics research. Furthermore, there was agreement that linking more data resources would improve toxicogenomics data analysis. To outline a roadmap to enhance interoperability between data resources, the participants recommend collecting user stories from the toxicogenomics research community on barriers in data sharing and exchange currently hampering answering to certain research questions. These user stories may guide the prioritization of steps to be taken for enhancing integration of toxicogenomics databases.

  7. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information

    PubMed Central

    Wilbur, W. John

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein–protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team. PMID:23160415

  8. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

    PubMed

    Kim, Sun; Kim, Won; Wei, Chih-Hsuan; Lu, Zhiyong; Wilbur, W John

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical-gene interactions, chemical-disease relationships and gene-disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein-protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team.

  9. The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Murphy, Cynthia G.; Mattingly, Carolyn J.

    2011-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349 000 molecular interactions between 6800 chemicals, 20 900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25 400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org PMID:21933848

  10. Prediction model of potential hepatocarcinogenicity of rat hepatocarcinogens using a large-scale toxicogenomics database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uehara, Takeki, E-mail: takeki.uehara@shionogi.co.jp; Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, 7-6-8 Asagi, Ibaraki, Osaka 567-0085; Minowa, Yohsuke

    2011-09-15

    The present study was performed to develop a robust gene-based prediction model for early assessment of potential hepatocarcinogenicity of chemicals in rats by using our toxicogenomics database, TG-GATEs (Genomics-Assisted Toxicity Evaluation System developed by the Toxicogenomics Project in Japan). The positive training set consisted of high- or middle-dose groups that received 6 different non-genotoxic hepatocarcinogens during a 28-day period. The negative training set consisted of high- or middle-dose groups of 54 non-carcinogens. Support vector machine combined with wrapper-type gene selection algorithms was used for modeling. Consequently, our best classifier yielded prediction accuracies for hepatocarcinogenicity of 99% sensitivity and 97% specificitymore » in the training data set, and false positive prediction was almost completely eliminated. Pathway analysis of feature genes revealed that the mitogen-activated protein kinase p38- and phosphatidylinositol-3-kinase-centered interactome and the v-myc myelocytomatosis viral oncogene homolog-centered interactome were the 2 most significant networks. The usefulness and robustness of our predictor were further confirmed in an independent validation data set obtained from the public database. Interestingly, similar positive predictions were obtained in several genotoxic hepatocarcinogens as well as non-genotoxic hepatocarcinogens. These results indicate that the expression profiles of our newly selected candidate biomarker genes might be common characteristics in the early stage of carcinogenesis for both genotoxic and non-genotoxic carcinogens in the rat liver. Our toxicogenomic model might be useful for the prospective screening of hepatocarcinogenicity of compounds and prioritization of compounds for carcinogenicity testing. - Highlights: >We developed a toxicogenomic model to predict hepatocarcinogenicity of chemicals. >The optimized model consisting of 9 probes had 99% sensitivity and 97% specificity

  11. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  12. Targeted journal curation as a method to improve data currency at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Johnson, Robin J.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Rosenstein, Michael C.; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and manually curate a triad of chemical–gene, chemical–disease and gene–disease interactions. Typically, articles for CTD are selected using a chemical-centric approach by querying PubMed to retrieve a corpus containing the chemical of interest. Although this technique ensures adequate coverage of knowledge about the chemical (i.e. data completeness), it does not necessarily reflect the most current state of all toxicological research in the community at large (i.e. data currency). Keeping databases current with the most recent scientific results, as well as providing a rich historical background from legacy articles, is a challenging process. To address this issue of data currency, CTD designed and tested a journal-centric approach of curation to complement our chemical-centric method. We first identified priority journals based on defined criteria. Next, over 7 weeks, three biocurators reviewed 2425 articles from three consecutive years (2009–2011) of three targeted journals. From this corpus, 1252 articles contained relevant data for CTD and 52 752 interactions were manually curated. Here, we describe our journal selection process, two methods of document delivery for the biocurators and the analysis of the resulting curation metrics, including data currency, and both intra-journal and inter-journal comparisons of research topics. Based on our results, we expect that curation by select journals can (i) be easily incorporated into the curation pipeline to complement our chemical-centric approach; (ii) build content more evenly for chemicals, genes and diseases in CTD (rather than biasing data by chemicals-of-interest); (iii) reflect developing areas in environmental health and (iv) improve overall data currency for chemicals, genes and diseases. Database

  13. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES

    EPA Science Inventory

    Reproductive toxicogenomic studies generate large amounts of toxicological and genomic data. On the toxicology side, a substantial quantity of data accumulates from conventional endpoints such as histology, reproductive physiology and biochemistry. The largest source of genomics...

  14. Reconciled Rat and Human Metabolic Networks for Comparative Toxicogenomics and Biomarker Predictions

    DTIC Science & Technology

    2017-02-08

    compared with the original human GPR rules (Supplementary Fig. 3). The consensus-based approach for filtering orthology annotations was designed to...ARTICLE Received 29 Jan 2016 | Accepted 13 Dec 2016 | Published 8 Feb 2017 Reconciled rat and human metabolic networks for comparative toxicogenomics...predictions in response to 76 drugs. We validate comparative predictions for xanthine derivatives with new experimental data and literature- based evidence

  15. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; King, Benjamin L.; Wiegers, Jolene; Grondin, Cynthia J.; Sciaky, Daniela; Johnson, Robin J.; Mattingly, Carolyn J.

    2016-01-01

    Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD’s gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug

  16. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; King, Benjamin L; Wiegers, Jolene; Grondin, Cynthia J; Sciaky, Daniela; Johnson, Robin J; Mattingly, Carolyn J

    2016-01-01

    Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers

  17. Predicting Drug-induced Hepatotoxicity Using QSAR and Toxicogenomics Approaches

    PubMed Central

    Low, Yen; Uehara, Takeki; Minowa, Yohsuke; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro; Sedykh, Alexander; Muratov, Eugene; Fourches, Denis; Zhu, Hao; Rusyn, Ivan; Tropsha, Alexander

    2014-01-01

    Quantitative Structure-Activity Relationship (QSAR) modeling and toxicogenomics are used independently as predictive tools in toxicology. In this study, we evaluated the power of several statistical models for predicting drug hepatotoxicity in rats using different descriptors of drug molecules, namely their chemical descriptors and toxicogenomic profiles. The records were taken from the Toxicogenomics Project rat liver microarray database containing information on 127 drugs (http://toxico.nibio.go.jp/datalist.html). The model endpoint was hepatotoxicity in the rat following 28 days of exposure, established by liver histopathology and serum chemistry. First, we developed multiple conventional QSAR classification models using a comprehensive set of chemical descriptors and several classification methods (k nearest neighbor, support vector machines, random forests, and distance weighted discrimination). With chemical descriptors alone, external predictivity (Correct Classification Rate, CCR) from 5-fold external cross-validation was 61%. Next, the same classification methods were employed to build models using only toxicogenomic data (24h after a single exposure) treated as biological descriptors. The optimized models used only 85 selected toxicogenomic descriptors and had CCR as high as 76%. Finally, hybrid models combining both chemical descriptors and transcripts were developed; their CCRs were between 68 and 77%. Although the accuracy of hybrid models did not exceed that of the models based on toxicogenomic data alone, the use of both chemical and biological descriptors enriched the interpretation of the models. In addition to finding 85 transcripts that were predictive and highly relevant to the mechanisms of drug-induced liver injury, chemical structural alerts for hepatotoxicity were also identified. These results suggest that concurrent exploration of the chemical features and acute treatment-induced changes in transcript levels will both enrich the

  18. Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

    PubMed Central

    Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709

  19. TOXICOGENOMICS DRUG DISCOVERY AND THE PATHOLOGIST

    EPA Science Inventory

    Toxicogenomics, drug discovery, and pathologist.

    The field of toxicogenomics, which currently focuses on the application of large-scale differential gene expression (DGE) data to toxicology, is starting to influence drug discovery and development in the pharmaceutical indu...

  20. Utilizing toxicogenomic data to understand chemical mechanism of action in risk assessment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilson, Vickie S., E-mail: wilson.vickie@epa.gov; Keshava, Nagalakshmi; Hester, Susan

    2013-09-15

    The predominant role of toxicogenomic data in risk assessment, thus far, has been one of augmentation of more traditional in vitro and in vivo toxicology data. This article focuses on the current available examples of instances where toxicogenomic data has been evaluated in human health risk assessment (e.g., acetochlor and arsenicals) which have been limited to the application of toxicogenomic data to inform mechanism of action. This article reviews the regulatory policy backdrop and highlights important efforts to ultimately achieve regulatory acceptance. A number of research efforts on specific chemicals that were designed for risk assessment purposes have employed mechanismmore » or mode of action hypothesis testing and generating strategies. The strides made by large scale efforts to utilize toxicogenomic data in screening, testing, and risk assessment are also discussed. These efforts include both the refinement of methodologies for performing toxicogenomics studies and analysis of the resultant data sets. The current issues limiting the application of toxicogenomics to define mode or mechanism of action in risk assessment are discussed together with interrelated research needs. In summary, as chemical risk assessment moves away from a single mechanism of action approach toward a toxicity pathway-based paradigm, we envision that toxicogenomic data from multiple technologies (e.g., proteomics, metabolomics, transcriptomics, supportive RT-PCR studies) can be used in conjunction with one another to understand the complexities of multiple, and possibly interacting, pathways affected by chemicals which will impact human health risk assessment.« less

  1. Reconciled rat and human metabolic networks for comparative toxicogenomics and biomarker predictions

    PubMed Central

    Blais, Edik M.; Rawls, Kristopher D.; Dougherty, Bonnie V.; Li, Zhuo I.; Kolling, Glynis L.; Ye, Ping; Wallqvist, Anders; Papin, Jason A.

    2017-01-01

    The laboratory rat has been used as a surrogate to study human biology for more than a century. Here we present the first genome-scale network reconstruction of Rattus norvegicus metabolism, iRno, and a significantly improved reconstruction of human metabolism, iHsa. These curated models comprehensively capture metabolic features known to distinguish rats from humans including vitamin C and bile acid synthesis pathways. After reconciling network differences between iRno and iHsa, we integrate toxicogenomics data from rat and human hepatocytes, to generate biomarker predictions in response to 76 drugs. We validate comparative predictions for xanthine derivatives with new experimental data and literature-based evidence delineating metabolite biomarkers unique to humans. Our results provide mechanistic insights into species-specific metabolism and facilitate the selection of biomarkers consistent with rat and human biology. These models can serve as powerful computational platforms for contextualizing experimental data and making functional predictions for clinical and basic science applications. PMID:28176778

  2. Toxicogenomics and the Regulatory Framework

    EPA Science Inventory

    Toxicogenomics presents regulatory agencies with the opportunity to revolutionize their analyses by enabling the collection of information on a broader range of responses than currently considered in traditional regulatory decision making. Analyses of genomic responses are expec...

  3. TOXICOGENOMICS AND HUMAN DISEASE RISK ASSESSMENT

    EPA Science Inventory


    Toxicogenomics and Human Disease Risk Assessment.

    Complete sequencing of human and other genomes, availability of large-scale gene
    expression arrays with ever-increasing numbers of genes displayed, and steady
    improvements in protein expression technology can hav...

  4. Effect of the difference in vehicles on gene expression in the rat liver--analysis of the control data in the Toxicogenomics Project Database.

    PubMed

    Takashima, Kayoko; Mizukawa, Yumiko; Morishita, Katsumi; Okuyama, Manabu; Kasahara, Toshihiko; Toritsuka, Naoki; Miyagishima, Toshikazu; Nagao, Taku; Urushidani, Tetsuro

    2006-05-08

    The Toxicogenomics Project is a 5-year collaborative project by the Japanese government and pharmaceutical companies in 2002. Its aim is to construct a large-scale toxicology database of 150 compounds orally administered to rats. The test consists of a single administration test (3, 6, 9 and 24 h) and a repeated administration test (3, 7, 14 and 28 days), and the conventional toxicology data together with the gene expression data in liver as analyzed by using Affymetrix GeneChip are being accumulated. In the project, either methylcellulose or corn oil is employed as vehicle. We examined whether the vehicle itself affects the analysis of gene expression and found that corn oil alone affected the food consumption and biochemical parameters mainly related to lipid metabolism, and this accompanied typical changes in the gene expression. Most of the genes modulated by corn oil were related to cholesterol or fatty acid metabolism (e.g., CYP7A1, CYP8B1, 3-hydroxy-3-methylglutaryl-Coenzyme A reductase, squalene epoxidase, angiopoietin-like protein 4, fatty acid synthase, fatty acid binding proteins), suggesting that the response was physiologic to the oil intake. Many of the lipid-related genes showed circadian rhythm within a day, but the expression pattern of general clock genes (e.g., period 2, arylhydrocarbon nuclear receptor translocator-like, D site albumin promoter binding protein) were unaffected by corn oil, suggesting that the effects are specific for lipid metabolism. These results would be useful for usage of the database especially when drugs with different vehicle control are compared.

  5. The Metamorphosis of Amphibian Toxicogenomics

    PubMed Central

    Helbing, Caren C.

    2012-01-01

    Amphibians are important vertebrates in toxicology often representing both aquatic and terrestrial forms within the life history of the same species. Of the thousands of species, only two have substantial genomics resources: the recently published genome of the Pipid, Xenopus (Silurana) tropicalis, and transcript information (and ongoing genome sequencing project) of Xenopus laevis. However, many more species representative of regional ecological niches and life strategies are used in toxicology worldwide. Since Xenopus species diverged from the most populous frog family, the Ranidae, ~200 million years ago, there are notable differences between them and the even more distant Caudates (salamanders) and Caecilians. These differences include genome size, gene composition, and extent of polyploidization. Application of toxicogenomics to amphibians requires the mobilization of resources and expertise to develop de novo sequence assemblies and analysis strategies for a broader range of amphibian species. The present mini-review will present the advances in toxicogenomics as pertains to amphibians with particular emphasis upon the development and use of genomic techniques (inclusive of transcriptomics, proteomics, and metabolomics) and the challenges inherent therein. PMID:22435070

  6. Integrating toxicogenomics into human health risk assessment: lessons learned from the benzo[a]pyrene case study.

    PubMed

    Chepelev, Nikolai L; Moffat, Ivy D; Labib, Sarah; Bourdon-Lacombe, Julie; Kuo, Byron; Buick, Julie K; Lemieux, France; Malik, Amal I; Halappanavar, Sabina; Williams, Andrew; Yauk, Carole L

    2015-01-01

    The use of short-term toxicogenomic tests to predict cancer (or other health effects) offers considerable advantages relative to traditional toxicity testing methods. The advantages include increased throughput, increased mechanistic data, and significantly reduced costs. However, precisely how toxicogenomics data can be used to support human health risk assessment (RA) is unclear. In a companion paper ( Moffat et al. 2014 ), we present a case study evaluating the utility of toxicogenomics in the RA of benzo[a]pyrene (BaP), a known human carcinogen. The case study is meant as a proof-of-principle exercise using a well-established mode of action (MOA) that impacts multiple tissues, which should provide a best case example. We found that toxicogenomics provided rich mechanistic data applicable to hazard identification, dose-response analysis, and quantitative RA of BaP. Based on this work, here we share some useful lessons for both research and RA, and outline our perspective on how toxicogenomics can benefit RA in the short- and long-term. Specifically, we focus on (1) obtaining biologically relevant data that are readily suitable for establishing an MOA for toxicants, (2) examining the human relevance of an MOA from animal testing, and (3) proposing appropriate quantitative values for RA. We describe our envisioned strategy on how toxicogenomics can become a tool in RA, especially when anchored to other short-term toxicity tests (apical endpoints) to increase confidence in the proposed MOA, and emphasize the need for additional studies on other MOAs to define the best practices in the application of toxicogenomics in RA.

  7. Comparison of toxicogenomics and traditional approaches to inform mode of action and points of departure in human health risk assessment of benzo[a]pyrene in drinking water

    PubMed Central

    Labib, Sarah; Bourdon-Lacombe, Julie; Kuo, Byron; Buick, Julie K.; Lemieux, France; Williams, Andrew; Halappanavar, Sabina; Malik, Amal; Luijten, Mirjam; Aubrecht, Jiri; Hyduke, Daniel R.; Fornace, Albert J.; Swartz, Carol D.; Recio, Leslie; Yauk, Carole L.

    2015-01-01

    Toxicogenomics is proposed to be a useful tool in human health risk assessment. However, a systematic comparison of traditional risk assessment approaches with those applying toxicogenomics has never been done. We conducted a case study to evaluate the utility of toxicogenomics in the risk assessment of benzo[a]pyrene (BaP), a well-studied carcinogen, for drinking water exposures. Our study was intended to compare methodologies, not to evaluate drinking water safety. We compared traditional (RA1), genomics-informed (RA2) and genomics-only (RA3) approaches. RA2 and RA3 applied toxicogenomics data from human cell cultures and mice exposed to BaP to determine if these data could provide insight into BaP's mode of action (MOA) and derive tissue-specific points of departure (POD). Our global gene expression analysis supported that BaP is genotoxic in mice and allowed the development of a detailed MOA. Toxicogenomics analysis in human lymphoblastoid TK6 cells demonstrated a high degree of consistency in perturbed pathways with animal tissues. Quantitatively, the PODs for traditional and transcriptional approaches were similar (liver 1.2 vs. 1.0 mg/kg-bw/day; lung 0.8 vs. 3.7 mg/kg-bw/day; forestomach 0.5 vs. 7.4 mg/kg-bw/day). RA3, which applied toxicogenomics in the absence of apical toxicology data, demonstrates that this approach provides useful information in data-poor situations. Overall, our study supports the use of toxicogenomics as a relatively fast and cost-effective tool for hazard identification, preliminary evaluation of potential carcinogens, and carcinogenic potency, in addition to identifying current limitations and practical questions for future work. PMID:25605026

  8. TOXICOGENOMIC STUDY OF TRIAZOLE FUNGICIDES AND PERFLUOROALKYL ACIDS

    EPA Science Inventory

    Toxicogenomic analysis of five environmental contaminants was performed to investigate the ability of genomics to categorize chemicals and elucidate mechanisms of toxicity. Three triazole antifungals (myclobutanil, propiconazole and triadimefon) and two perfluorinated compounds (...

  9. Similar compounds searching system by using the gene expression microarray database.

    PubMed

    Toyoshiba, Hiroyoshi; Sawada, Hiroshi; Naeshiro, Ichiro; Horinouchi, Akira

    2009-04-10

    Numbers of microarrays have been examined and several public and commercial databases have been developed. However, it is not easy to compare in-house microarray data with those in a database because of insufficient reproducibility due to differences in the experimental conditions. As one of the approach to use these databases, we developed the similar compounds searching system (SCSS) on a toxicogenomics database. The datasets of 55 compounds administered to rats in the Toxicogenomics Project (TGP) database in Japan were used in this study. Using the fold-change ranking method developed by Lamb et al. [Lamb, J., Crawford, E.D., Peck, D., Modell, J.W., Blat, I.C., Wrobel, M.J., Lerner, J., Brunet, J.P., Subramanian, A., Ross, K.N., Reich, M., Hieronymus, H., Wei, G., Armstrong, S.A., Haggarty, S.J., Clemons, P.A., Wei, R., Carr, S.A., Lander, E.S., Golub, T.R., 2006. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929-1935] and criteria called hit ratio, the system let us compare in-house microarray data and those in the database. In-house generated data for clofibrate, phenobarbital, and a proprietary compound were tested to evaluate the performance of the SCSS method. Phenobarbital and clofibrate, which were included in the TGP database, scored highest by the SCSS method. Other high scoring compounds had effects similar to either phenobarbital (a cytochrome P450s inducer) or clofibrate (a peroxisome proliferator). Some of high scoring compounds identified using the proprietary compound-administered rats have been known to cause similar toxicological changes in different species. Our results suggest that the SCSS method could be used in drug discovery and development. Moreover, this method may be a powerful tool to understand the mechanisms by which biological systems respond to various chemical compounds and may also predict adverse effects of new compounds.

  10. Meeting report: Validation of toxicogenomics-based test systems: ECVAM-ICCVAM/NICEATM considerations for regulatory use.

    PubMed

    Corvi, Raffaella; Ahr, Hans-Jürgen; Albertini, Silvio; Blakey, David H; Clerici, Libero; Coecke, Sandra; Douglas, George R; Gribaldo, Laura; Groten, John P; Haase, Bernd; Hamernik, Karen; Hartung, Thomas; Inoue, Tohru; Indans, Ian; Maurici, Daniela; Orphanides, George; Rembges, Diana; Sansone, Susanna-Assunta; Snape, Jason R; Toda, Eisaku; Tong, Weida; van Delft, Joost H; Weis, Brenda; Schechtman, Leonard M

    2006-03-01

    This is the report of the first workshop "Validation of Toxicogenomics-Based Test Systems" held 11-12 December 2003 in Ispra, Italy. The workshop was hosted by the European Centre for the Validation of Alternative Methods (ECVAM) and organized jointly by ECVAM, the U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), and the National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM). The primary aim of the workshop was for participants to discuss and define principles applicable to the validation of toxicogenomics platforms as well as validation of specific toxicologic test methods that incorporate toxicogenomics technologies. The workshop was viewed as an opportunity for initiating a dialogue between technologic experts, regulators, and the principal validation bodies and for identifying those factors to which the validation process would be applicable. It was felt that to do so now, as the technology is evolving and associated challenges are identified, would be a basis for the future validation of the technology when it reaches the appropriate stage. Because of the complexity of the issue, different aspects of the validation of toxicogenomics-based test methods were covered. The three focus areas include a) biologic validation of toxicogenomics-based test methods for regulatory decision making, b) technical and bioinformatics aspects related to validation, and c) validation issues as they relate to regulatory acceptance and use of toxicogenomics-based test methods. In this report we summarize the discussions and describe in detail the recommendations for future direction and priorities.

  11. Meeting Report: Validation of Toxicogenomics-Based Test Systems: ECVAM–ICCVAM/NICEATM Considerations for Regulatory Use

    PubMed Central

    Corvi, Raffaella; Ahr, Hans-Jürgen; Albertini, Silvio; Blakey, David H.; Clerici, Libero; Coecke, Sandra; Douglas, George R.; Gribaldo, Laura; Groten, John P.; Haase, Bernd; Hamernik, Karen; Hartung, Thomas; Inoue, Tohru; Indans, Ian; Maurici, Daniela; Orphanides, George; Rembges, Diana; Sansone, Susanna-Assunta; Snape, Jason R.; Toda, Eisaku; Tong, Weida; van Delft, Joost H.; Weis, Brenda; Schechtman, Leonard M.

    2006-01-01

    This is the report of the first workshop “Validation of Toxicogenomics-Based Test Systems” held 11–12 December 2003 in Ispra, Italy. The workshop was hosted by the European Centre for the Validation of Alternative Methods (ECVAM) and organized jointly by ECVAM, the U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), and the National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM). The primary aim of the workshop was for participants to discuss and define principles applicable to the validation of toxicogenomics platforms as well as validation of specific toxicologic test methods that incorporate toxicogenomics technologies. The workshop was viewed as an opportunity for initiating a dialogue between technologic experts, regulators, and the principal validation bodies and for identifying those factors to which the validation process would be applicable. It was felt that to do so now, as the technology is evolving and associated challenges are identified, would be a basis for the future validation of the technology when it reaches the appropriate stage. Because of the complexity of the issue, different aspects of the validation of toxicogenomics-based test methods were covered. The three focus areas include a) biologic validation of toxicogenomics-based test methods for regulatory decision making, b) technical and bioinformatics aspects related to validation, and c) validation issues as they relate to regulatory acceptance and use of toxicogenomics-based test methods. In this report we summarize the discussions and describe in detail the recommendations for future direction and priorities. PMID:16507466

  12. Integrating genetic and toxicogenomic information for determining underlying susceptibility to developmental disorders.

    PubMed

    Robinson, Joshua F; Port, Jesse A; Yu, Xiaozhong; Faustman, Elaine M

    2010-10-01

    To understand the complex etiology of developmental disorders, an understanding of both genetic and environmental risk factors is needed. Human and rodent genetic studies have identified a multitude of gene candidates for specific developmental disorders such as neural tube defects (NTDs). With the emergence of toxicogenomic-based assessments, scientists now also have the ability to compare and understand the expression of thousands of genes simultaneously across strain, time, and exposure in developmental models. Using a systems-based approach in which we are able to evaluate information from various parts and levels of the developing organism, we propose a framework for integrating genetic information with toxicogenomic-based studies to better understand gene-environmental interactions critical for developmental disorders. This approach has allowed us to characterize candidate genes in the context of variables critical for determining susceptibility such as strain, time, and exposure. Using a combination of toxicogenomic studies and complementary bioinformatic tools, we characterize NTD candidate genes during normal development by function (gene ontology), linked phenotype (disease outcome), location, and expression (temporally and strain-dependent). In addition, we show how environmental exposures (cadmium, methylmercury) can influence expression of these genes in a strain-dependent manner. Using NTDs as an example of developmental disorder, we show how simple integration of genetic information from previous studies into the standard microarray design can enhance analysis of gene-environment interactions to better define environmental exposure-disease pathways in sensitive and resistant mouse strains. © Wiley-Liss, Inc.

  13. EPA'S TOXICOGENOMICS PARTNERSHIPS ACROSS GOVERNMENT, ACADEMIA AND INDUSTRY

    EPA Science Inventory

    Genomics, proteomics and metabonomics technologies are transforming the science of toxicology, and concurrent advances in computing and informatics are providing management and analysis solutions for this onslaught of toxicogenomic data. EPA has been actively developing an intra...

  14. Comparative analysis of predictive models for nongenotoxic hepatocarcinogenicity using both toxicogenomics and quantitative structure-activity relationships.

    PubMed

    Liu, Zhichao; Kelly, Reagan; Fang, Hong; Ding, Don; Tong, Weida

    2011-07-18

    The primary testing strategy to identify nongenotoxic carcinogens largely relies on the 2-year rodent bioassay, which is time-consuming and labor-intensive. There is an increasing effort to develop alternative approaches to prioritize the chemicals for, supplement, or even replace the cancer bioassay. In silico approaches based on quantitative structure-activity relationships (QSAR) are rapid and inexpensive and thus have been investigated for such purposes. A slightly more expensive approach based on short-term animal studies with toxicogenomics (TGx) represents another attractive option for this application. Thus, the primary questions are how much better predictive performance using short-term TGx models can be achieved compared to that of QSAR models, and what length of exposure is sufficient for high quality prediction based on TGx. In this study, we developed predictive models for rodent liver carcinogenicity using gene expression data generated from short-term animal models at different time points and QSAR. The study was focused on the prediction of nongenotoxic carcinogenicity since the genotoxic chemicals can be inexpensively removed from further development using various in vitro assays individually or in combination. We identified 62 chemicals whose hepatocarcinogenic potential was available from the National Center for Toxicological Research liver cancer database (NCTRlcdb). The gene expression profiles of liver tissue obtained from rats treated with these chemicals at different time points (1 day, 3 days, and 5 days) are available from the Gene Expression Omnibus (GEO) database. Both TGx and QSAR models were developed on the basis of the same set of chemicals using the same modeling approach, a nearest-centroid method with a minimum redundancy and maximum relevancy-based feature selection with performance assessed using compound-based 5-fold cross-validation. We found that the TGx models outperformed QSAR in every aspect of modeling. For example, the

  15. Toward a Public Toxicogenomics Capability for Supporting ...

    EPA Pesticide Factsheets

    A publicly available toxicogenomics capability for supporting predictive toxicology and meta-analysis depends on availability of gene expression data for chemical treatment scenarios, the ability to locate and aggregate such information by chemical, and broad data coverage within chemical, genomics, and toxicological information domains. This capability also depends on common genomics standards, protocol description, and functional linkages of diverse public Internet data resources. We present a survey of public genomics resources from these vantage points and conclude that, despite progress in many areas, the current state of the majority of public microarray databases is inadequate for supporting these objectives, particularly with regard to chemical indexing. To begin to address these inadequacies, we focus chemical annotation efforts on experimental content contained in the two primary public genomic resources: ArrayExpress and Gene Expression Omnibus. Automated scripts and extensive manual review were employed to transform free-text experiment descriptions into a standardized, chemically indexed inventory of experiments in both resources. These files, which include top-level summary annotations, allow for identification of current chemical-associated experimental content, as well as chemical-exposure–related (or

  16. Complementary roles for toxicologic pathology and mathematics in toxicogenomics, with special reference to data interpretation and oscillatory dynamics.

    PubMed

    Morgan, Kevin T; Pino, Michael; Crosby, Lynn M; Wang, Min; Elston, Timothy C; Jayyosi, Zaid; Bonnefoi, Marc; Boorman, Gary

    2004-01-01

    Toxicogenomics is an emerging multidisciplinary science that will profoundly impact the practice of toxicology. New generations of biologists, using evolving toxicogenomics tools, will generate massive data sets in need of interpretation. Mathematical tools are necessary to cluster and otherwise find meaningful structure in such data. The linking of this structure to gene functions and disease processes, and finally the generation of useful data interpretation remains a significant challenge. The training and background of pathologists make them ideally suited to contribute to the field of toxicogenomics, from experimental design to data interpretation. Toxicologic pathology, a discipline based on pattern recognition, requires familiarity with the dynamics of disease processes and interactions between organs, tissues, and cell populations. Optimal involvement of toxicologic pathologists in toxicogenomics requires that they communicate effectively with the many other scientists critical for the effective application of this complex discipline to societal problems. As noted by Petricoin III et al (Nature Genetics 32, 474-479, 2002), cooperation among regulators, sponsors and experts will be essential for realizing the potential of microarrays for public health. Following a brief introduction to the role of mathematics in toxicogenomics, "data interpretation" from the perspective of a pathologist is briefly discussed. Based on oscillatory behavior in the liver, the importance of an understanding of mathematics is addressed, and an approach to learning mathematics "later in life" is provided. An understanding of pathology by mathematicians involved in toxicogenomics is equally critical, as both mathematics and pathology are essential for transforming toxicogenomics data sets into useful knowledge.

  17. Toxicogenomics concepts and applications to study hepatic effects of food additives and chemicals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stierum, Rob; Heijne, Wilbert; Kienhuis, Anne

    2005-09-01

    Transcriptomics, proteomics and metabolomics are genomics technologies with great potential in toxicological sciences. Toxicogenomics involves the integration of conventional toxicological examinations with gene, protein or metabolite expression profiles. An overview together with selected examples of the possibilities of genomics in toxicology is given. The expectations raised by toxicogenomics are earlier and more sensitive detection of toxicity. Furthermore, toxicogenomics will provide a better understanding of the mechanism of toxicity and may facilitate the prediction of toxicity of unknown compounds. Mechanism-based markers of toxicity can be discovered and improved interspecies and in vitro-in vivo extrapolations will drive model developments in toxicology. Toxicologicalmore » assessment of chemical mixtures will benefit from the new molecular biological tools. In our laboratory, toxicogenomics is predominantly applied for elucidation of mechanisms of action and discovery of novel pathway-supported mechanism-based markers of liver toxicity. In addition, we aim to integrate transcriptome, proteome and metabolome data, supported by bioinformatics to develop a systems biology approach for toxicology. Transcriptomics and proteomics studies on bromobenzene-mediated hepatotoxicity in the rat are discussed. Finally, an example is shown in which gene expression profiling together with conventional biochemistry led to the discovery of novel markers for the hepatic effects of the food additives butylated hydroxytoluene, curcumin, propyl gallate and thiabendazole.« less

  18. Integrating toxicogenomics data into cancer adverse outcome pathways

    EPA Science Inventory

    Integrating toxicogenomics data into adverse outcome pathways for cancer.J. Christopher CortonNHEERL/ORD, EPA, Research Triangle Park, NCAs the toxicology field continues to move towards a new paradigm in toxicity testing and safety assessment, there is the expectation that model...

  19. Use of Genomic Data in Risk Assessment Caes Study: II. Evaluation of the Dibutyl Phthalate Toxicogenomic Dataset

    EPA Science Inventory

    An evaluation of the toxicogenomic data set for dibutyl phthalate (DBP) and male reproductive developmental effects was performed as part of a larger case study to test an approach for incorporating genomic data in risk assessment. The DBP toxicogenomic data set is composed of ni...

  20. An Approach to Using Toxicogenomic Data in US EPA Human ...

    EPA Pesticide Factsheets

    EPA announced the availability of the final report, An Approach to Using Toxicogenomic Data in U.S. EPA Human Health Risk Assessments: A Dibutyl Phthalate Case Study. This report outlines an approach to evaluate genomic data for use in risk assessment and a case study to illustrate the approach. The dibutyl phthalate (DBP) case study example focuses on male reproductive developmental effects and the qualitative application of genomic data because of the available data on DBP. The case study presented in this report is a separate activity from any of the ongoing IRIS human health assessments for the phthalates. The National Center for Environmental Assessment (NCEA) prepared this document for the purpose of describing and illustrating an approach for using toxicogenomic data in risk assessment.

  1. Evaluation of sequencing approaches for high-throughput toxicogenomics (SOT)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platfo...

  2. Use of genomic data in risk assessment case study: II. Evaluation of the dibutyl phthalate toxicogenomic data set

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Euling, Susan Y., E-mail: euling.susan@epa.gov; White, Lori D.; Kim, Andrea S.

    An evaluation of the toxicogenomic data set for dibutyl phthalate (DBP) and male reproductive developmental effects was performed as part of a larger case study to test an approach for incorporating genomic data in risk assessment. The DBP toxicogenomic data set is composed of nine in vivo studies from the published literature that exposed rats to DBP during gestation and evaluated gene expression changes in testes or Wolffian ducts of male fetuses. The exercise focused on qualitative evaluation, based on a lack of available dose–response data, of the DBP toxicogenomic data set to postulate modes and mechanisms of action formore » the male reproductive developmental outcomes, which occur in the lower dose range. A weight-of-evidence evaluation was performed on the eight DBP toxicogenomic studies of the rat testis at the gene and pathway levels. The results showed relatively strong evidence of DBP-induced downregulation of genes in the steroidogenesis pathway and lipid/sterol/cholesterol transport pathway as well as effects on immediate early gene/growth/differentiation, transcription, peroxisome proliferator-activated receptor signaling and apoptosis pathways in the testis. Since two established modes of action (MOAs), reduced fetal testicular testosterone production and Insl3 gene expression, explain some but not all of the testis effects observed in rats after in utero DBP exposure, other MOAs are likely to be operative. A reanalysis of one DBP microarray study identified additional pathways within cell signaling, metabolism, hormone, disease, and cell adhesion biological processes. These putative new pathways may be associated with DBP effects on the testes that are currently unexplained. This case study on DBP identified data gaps and research needs for the use of toxicogenomic data in risk assessment. Furthermore, this study demonstrated an approach for evaluating toxicogenomic data in human health risk assessment that could be applied to future chemicals

  3. Toward a public toxicogenomics capability for supporting predictive toxicology: survey of current resources and chemical indexing of experiments in GEO and ArrayExpress.

    PubMed

    Williams-Devane, ClarLynda R; Wolf, Maritja A; Richard, Ann M

    2009-06-01

    A publicly available toxicogenomics capability for supporting predictive toxicology and meta-analysis depends on availability of gene expression data for chemical treatment scenarios, the ability to locate and aggregate such information by chemical, and broad data coverage within chemical, genomics, and toxicological information domains. This capability also depends on common genomics standards, protocol description, and functional linkages of diverse public Internet data resources. We present a survey of public genomics resources from these vantage points and conclude that, despite progress in many areas, the current state of the majority of public microarray databases is inadequate for supporting these objectives, particularly with regard to chemical indexing. To begin to address these inadequacies, we focus chemical annotation efforts on experimental content contained in the two primary public genomic resources: ArrayExpress and Gene Expression Omnibus. Automated scripts and extensive manual review were employed to transform free-text experiment descriptions into a standardized, chemically indexed inventory of experiments in both resources. These files, which include top-level summary annotations, allow for identification of current chemical-associated experimental content, as well as chemical-exposure-related (or "Treatment") content of greatest potential value to toxicogenomics investigation. With these chemical-index files, it is possible for the first time to assess the breadth and overlap of chemical study space represented in these databases, and to begin to assess the sufficiency of data with shared protocols for chemical similarity inferences. Chemical indexing of public genomics databases is a first important step toward integrating chemical, toxicological and genomics data into predictive toxicology.

  4. Toxicogenomics in regulatory ecotoxicology

    USGS Publications Warehouse

    Ankley, Gerald T.; Daston, George P.; Degitz, Sigmund J.; Denslow, Nancy D.; Hoke, Robert A.; Kennedy, Sean W.; Miracle, Ann L.; Perkins, Edward J.; Snape, Jason; Tillitt, Donald E.; Tyler, Charles R.; Versteeg, Donald

    2006-01-01

    Recently, we have witnessed an explosion of different genomic approaches that, through a combination of advanced biological, instrumental, and bioinformatic techniques, can yield a previously unparalleled amount of data concerning the molecular and biochemical status of organisms. Fueled partially by large, well-publicized efforts such as the Human Genome Project, genomic research has become a rapidly growing topical area in multiple biological disciplines. Since 1999, when the term “toxicogenomics” was coined to describe the application of genomics to toxicology (1), a rapid increase in publications on the topic has occurred (Figure 1). The potential utility of toxicogenomics in toxicological research and regulatory activities has been the subject of scientific discussions and, as with any new technology, has evoked a wide range of opinion (2–6).

  5. EPA SCIENCE FORUM - EPA'S TOXICOGENOMICS PARTNERSHIPS ACROSS GOVERNMENT, ACADEMIA AND INDUSTRY

    EPA Science Inventory

    Over the past decade genomics, proteomics and metabonomics technologies have transformed the science of toxicology, and concurrent advances in computing and informatics have provided management and analysis solutions for this onslaught of toxicogenomic data. EPA has been actively...

  6. Comparison of MeHg-induced toxicogenomic responses across in vivo and in vitro models used in developmental toxicology.

    PubMed

    Robinson, Joshua F; Theunissen, Peter T; van Dartel, Dorien A M; Pennings, Jeroen L; Faustman, Elaine M; Piersma, Aldert H

    2011-09-01

    Toxicogenomic evaluations may improve toxicity prediction of in vitro-based developmental models, such as whole embryo culture (WEC) and embryonic stem cells (ESC), by providing a robust mechanistic marker which can be linked with responses associated with developmental toxicity in vivo. While promising in theory, toxicogenomic comparisons between in vivo and in vitro models are complex due to inherent differences in model characteristics and experimental design. Determining factors which influence these global comparisons are critical in the identification of reliable mechanistic-based markers of developmental toxicity. In this study, we compared available toxicogenomic data assessing the impact of the known teratogen, methylmercury (MeHg) across a diverse set of in vitro and in vivo models to investigate the impact of experimental variables (i.e. model, dose, time) on our comparative assessments. We evaluated common and unique aspects at both the functional (Gene Ontology) and gene level of MeHg-induced response. At the functional level, we observed stronger similarity in MeHg-response between mouse embryos exposed in utero (2 studies), ESC, and WEC as compared to liver, brain and mouse embryonic fibroblast MeHg studies. These findings were strongly correlated to the presence of a MeHg-induced developmentally related gene signature. In addition, we identified specific MeHg-induced gene expression alterations associated with developmental signaling and heart development across WEC, ESC and in vivo systems. However, the significance of overlap between studies was highly dependent on traditional experimental variables (i.e. dose, time). In summary, we identify promising examples of unique gene expression responses which show in vitro-in vivo similarities supporting the relevance of in vitro developmental models for predicting in vivo developmental toxicity. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. Mixture toxicity revisited from a toxicogenomic perspective.

    PubMed

    Altenburger, Rolf; Scholz, Stefan; Schmitt-Jansen, Mechthild; Busch, Wibke; Escher, Beate I

    2012-03-06

    The advent of new genomic techniques has raised expectations that central questions of mixture toxicology such as for mechanisms of low dose interactions can now be answered. This review provides an overview on experimental studies from the past decade that address diagnostic and/or mechanistic questions regarding the combined effects of chemical mixtures using toxicogenomic techniques. From 2002 to 2011, 41 studies were published with a focus on mixture toxicity assessment. Primarily multiplexed quantification of gene transcripts was performed, though metabolomic and proteomic analysis of joint exposures have also been undertaken. It is now standard to explicitly state criteria for selecting concentrations and provide insight into data transformation and statistical treatment with respect to minimizing sources of undue variability. Bioinformatic analysis of toxicogenomic data, by contrast, is still a field with diverse and rapidly evolving tools. The reported combined effect assessments are discussed in the light of established toxicological dose-response and mixture toxicity models. Receptor-based assays seem to be the most advanced toward establishing quantitative relationships between exposure and biological responses. Often transcriptomic responses are discussed based on the presence or absence of signals, where the interpretation may remain ambiguous due to methodological problems. The majority of mixture studies design their studies to compare the recorded mixture outcome against responses for individual components only. This stands in stark contrast to our existing understanding of joint biological activity at the levels of chemical target interactions and apical combined effects. By joining established mixture effect models with toxicokinetic and -dynamic thinking, we suggest a conceptual framework that may help to overcome the current limitation of providing mainly anecdotal evidence on mixture effects. To achieve this we suggest (i) to design studies to

  8. Developing Computational Tools for Application of Toxicogenomics to Environmental Regulations and Risk Assessment

    EPA Science Inventory

    Toxicogenomics is the study of changes in gene expression, protein, and metabolite profiles within cells and tissues, complementary to more traditional toxicological methods. Genomics tools provide detailed molecular data about the underlying biochemical mechanisms of toxicity, a...

  9. Toxicogenomics to Evaluate Endocrine Disrupting Effects of Environmental Chemicals Using the Zebrafish Model

    PubMed Central

    Caballero-Gallardo, Karina; Olivero-Verbel, Jesus; Freeman, Jennifer L.

    2016-01-01

    The extent of our knowledge on the number of chemical compounds related to anthropogenic activities that can cause damage to the environment and to organisms is increasing. Endocrine disrupting chemicals (EDCs) are one group of potentially hazardous substances that include natural and synthetic chemicals and have the ability to mimic endogenous hormones, interfering with their biosynthesis, metabolism, and normal functions. Adverse effects associated with EDC exposure have been documented in aquatic biota and there is widespread interest in the characterization and understanding of their modes of action. Fish are considered one of the primary risk organisms for EDCs. Zebrafish (Danio rerio) are increasingly used as an animal model to study the effects of endocrine disruptors, due to their advantages compared to other model organisms. One approach to assess the toxicity of a compound is to identify those patterns of gene expression found in a tissue or organ exposed to particular classes of chemicals, through new technologies in genomics (toxicogenomics), such as microarrays or whole-genome sequencing. Application of these technologies permit the quantitative analysis of thousands of gene expression changes simultaneously in a single experiment and offer the opportunity to use transcript profiling as a tool to predict toxic outcomes of exposure to particular compounds. The application of toxicogenomic tools for identification of chemicals with endocrine disrupting capacity using the zebrafish model system is reviewed. PMID:28217008

  10. CONCEPTUAL FRAMEWORK FOR THE CHEMICAL EFFECTS IN BIOLOGICAL SYSTEMS (CEBS) TOXICOGENOMICS KNOWLEDGE BASE

    EPA Science Inventory

    Conceptual Framework for the Chemical Effects in Biological Systems (CEBS) T oxicogenomics Knowledge Base

    Abstract
    Toxicogenomics studies how the genome is involved in responses to environmental stressors or toxicants. It combines genetics, genome-scale mRNA expressio...

  11. Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics.

    PubMed

    Chung, Ming-Hua; Wang, Yuping; Tang, Hailin; Zou, Wen; Basinger, John; Xu, Xiaowei; Tong, Weida

    2015-01-01

    The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

  12. Toxicogenomics and cancer risk assessment: a framework for key event analysis and dose-response assessment for nongenotoxic carcinogens.

    PubMed

    Bercu, Joel P; Jolly, Robert A; Flagella, Kelly M; Baker, Thomas K; Romero, Pedro; Stevens, James L

    2010-12-01

    In order to determine a threshold for nongenotoxic carcinogens, the traditional risk assessment approach has been to identify a mode of action (MOA) with a nonlinear dose-response. The dose-response for one or more key event(s) linked to the MOA for carcinogenicity allows a point of departure (POD) to be selected from the most sensitive effect dose or no-effect dose. However, this can be challenging because multiple MOAs and key events may exist for carcinogenicity and oftentimes extensive research is required to elucidate the MOA. In the present study, a microarray analysis was conducted to determine if a POD could be identified following short-term oral rat exposure with two nongenotoxic rodent carcinogens, fenofibrate and methapyrilene, using a benchmark dose analysis of genes aggregated in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology (GO) biological processes, which likely encompass key event(s) for carcinogenicity. The gene expression response for fenofibrate given to rats for 2days was consistent with its MOA and known key events linked to PPARα activation. The temporal response from daily dosing with methapyrilene demonstrated biological complexity with waves of pathways/biological processes occurring over 1, 3, and 7days; nonetheless, the benchmark dose values were consistent over time. When comparing the dose-response of toxicogenomic data to tumorigenesis or precursor events, the toxicogenomics POD was slightly below any effect level. Our results suggest that toxicogenomic analysis using short-term studies can be used to identify a threshold for nongenotoxic carcinogens based on evaluation of potential key event(s) which then can be used within a risk assessment framework. Copyright © 2010 Elsevier Inc. All rights reserved.

  13. NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database

    PubMed Central

    Mangal, Manu; Sagar, Parul; Singh, Harinder; Raghava, Gajendra P. S.; Agarwal, Subhash M.

    2013-01-01

    Plant-derived molecules have been highly valued by biomedical researchers and pharmaceutical companies for developing drugs, as they are thought to be optimized during evolution. Therefore, we have collected and compiled a central resource Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database (NPACT, http://crdd.osdd.net/raghava/npact/) that gathers the information related to experimentally validated plant-derived natural compounds exhibiting anti-cancerous activity (in vitro and in vivo), to complement the other databases. It currently contains 1574 compound entries, and each record provides information on their structure, manually curated published data on in vitro and in vivo experiments along with reference for users referral, inhibitory values (IC50/ED50/EC50/GI50), properties (physical, elemental and topological), cancer types, cell lines, protein targets, commercial suppliers and drug likeness of compounds. NPACT can easily be browsed or queried using various options, and an online similarity tool has also been made available. Further, to facilitate retrieval of existing data, each record is hyperlinked to similar databases like SuperNatural, Herbal Ingredients’ Targets, Comparative Toxicogenomics Database, PubChem and NCI-60 GI50 data. PMID:23203877

  14. Human cell toxicogenomic analysis of bromoacetic acid: a regulated drinking water disinfection by-product.

    PubMed

    Muellner, Mark G; Attene-Ramos, Matias S; Hudson, Matthew E; Wagner, Elizabeth D; Plewa, Michael J

    2010-04-01

    The disinfection of drinking water is a major achievement in protecting the public health. However, current disinfection methods also generate disinfection by-products (DBPs). Many DBPs are cytotoxic, genotoxic, teratogenic, and carcinogenic and represent an important class of environmentally hazardous chemicals that may carry long-term human health implications. The objective of this research was to integrate in vitro toxicology with focused toxicogenomic analysis of the regulated DBP, bromoacetic acid (BAA) and to evaluate modulation of gene expression involved in DNA damage/repair and toxic responses, with nontransformed human cells. We generated transcriptome profiles for 168 genes with 30 min and 4 hr exposure times that did not induce acute cytotoxicity. Using qRT-PCR gene arrays, the levels of 25 transcripts were modulated to a statistically significant degree in response to a 30 min treatment with BAA (16 transcripts upregulated and nine downregulated). The largest changes were observed for RAD9A and BRCA1. The majority of the altered transcript profiles are genes involved in DNA repair, especially the repair of double strand DNA breaks, and in cell cycle regulation. With 4 hr of treatment the expression of 28 genes was modulated (12 upregulated and 16 downregulated); the largest fold changes were in HMOX1 and FMO1. This work represents the first nontransformed human cell toxicogenomic study with a regulated drinking water disinfection by-product. These data implicate double strand DNA breaks as a feature of BAA exposure. Future toxicogenomic studies of DBPs will further strengthen our limited knowledge in this growing area of drinking water research. Copyright 2009 Wiley-Liss, Inc.

  15. Yeast Toxicogenomics: Genome-Wide Responses to Chemical Stresses with Impact in Environmental Health, Pharmacology, and Biotechnology

    PubMed Central

    dos Santos, Sandra C.; Teixeira, Miguel Cacho; Cabrito, Tânia R.; Sá-Correia, Isabel

    2012-01-01

    The emerging transdisciplinary field of Toxicogenomics aims to study the cell response to a given toxicant at the genome, transcriptome, proteome, and metabolome levels. This approach is expected to provide earlier and more sensitive biomarkers of toxicological responses and help in the delineation of regulatory risk assessment. The use of model organisms to gather such genomic information, through the exploitation of Omics and Bioinformatics approaches and tools, together with more focused molecular and cellular biology studies are rapidly increasing our understanding and providing an integrative view on how cells interact with their environment. The use of the model eukaryote Saccharomyces cerevisiae in the field of Toxicogenomics is discussed in this review. Despite the limitations intrinsic to the use of such a simple single cell experimental model, S. cerevisiae appears to be very useful as a first screening tool, limiting the use of animal models. Moreover, it is also one of the most interesting systems to obtain a truly global understanding of the toxicological response and resistance mechanisms, being in the frontline of systems biology research and developments. The impact of the knowledge gathered in the yeast model, through the use of Toxicogenomics approaches, is highlighted here by its use in prediction of toxicological outcomes of exposure to pesticides and pharmaceutical drugs, but also by its impact in biotechnology, namely in the development of more robust crops and in the improvement of yeast strains as cell factories. PMID:22529852

  16. Intersection of toxicogenomics and high throughput screening in the Tox21 program: an NIEHS perspective.

    PubMed

    Merrick, B Alex; Paules, Richard S; Tice, Raymond R

    Humans are exposed to thousands of chemicals with inadequate toxicological data. Advances in computational toxicology, robotic high throughput screening (HTS), and genome-wide expression have been integrated into the Tox21 program to better predict the toxicological effects of chemicals. Tox21 is a collaboration among US government agencies initiated in 2008 that aims to shift chemical hazard assessment from traditional animal toxicology to target-specific, mechanism-based, biological observations using in vitro assays and lower organism models. HTS uses biocomputational methods for probing thousands of chemicals in in vitro assays for gene-pathway response patterns predictive of adverse human health outcomes. In 1999, NIEHS began exploring the application of toxicogenomics to toxicology and recent advances in NextGen sequencing should greatly enhance the biological content obtained from HTS platforms. We foresee an intersection of new technologies in toxicogenomics and HTS as an innovative development in Tox21. Tox21 goals, priorities, progress, and challenges will be reviewed.

  17. SOURCES OF VARIATION IN BASELINE GENE EXPRESSION LEVELS FROM TOXICOGENOMIC STUDY CONTROL ANIMALS ACROSS MULTIPLE LABORATORIES

    EPA Science Inventory

    Variations in study design are typical for toxicogenomic studies, but their impact on gene expression in control animals has not been well characterized. A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Scienc...

  18. TOXICOGENOMIC STUDY OF TRIAZOLE FUNGICIDES AND PERFLUOROALKYL ACIDS IN RAT LIVERS ACCURATELY CATEGORIZES CHEMICALS AND IDENTIFIES MECHANISMS OF TOXICITY

    EPA Science Inventory

    Toxicogenomic analysis of five environmental chemicals was performed to investigate the ability of genomics to predict toxicity, categorize chemicals, and elucidate mechanisms of toxicity. Three triazole antifungals (myclobutanil, propiconazole, and triadimefon) and two perfluori...

  19. Toxicogenomic analysis in the combined effect of tributyltin and benzo[a]pyrene on the development of zebrafish embryos.

    PubMed

    Huang, Lixing; Zuo, Zhenghong; Zhang, Youyu; Wang, Chonggang

    2015-01-01

    There is a growing recognition that the toxic effects of chemical mixtures are been an important issue in toxicological sciences. Tributyltin (TBT) and benzo[a]pyrene (BaP) are widespread pollutants that occur simultaneously in the aquatic environments. This study was designed to examine comprehensively the combined effects of TBT and BaP on zebrafish (Danio rerio) embryos using toxicogenomic approach combined with biochemical detection and morphological analysis, and tried to gain insight into the mechanisms underlying the combined effects of TBT and BaP. The results of toxicogenomic data indicated that: (1) TBT cotreatment rescued the embryos from decreased hatching ratio caused by BaP alone, while the alteration of gene expression (in this article the phrase gene expression is used as a synonym to gene transcription, although in is acknowledged that gene expression can also be regulated by, e.g., translation and mRNA or protein stability) relative to zebrafish hatching in the BaP groups was resumed by the cotreatment with TBT; (2) BaP cotreatment decreased TBT-mediated dorsal curvature, and alleviated the perturbation of Notch pathway caused by TBT alone; (3) cotreatment with TBT decreased BaP-mediated bradycardia, which might be due to that TBT cotreatment alleviated the perturbation in expression of genes related to cardiac muscle cell development and calcium handling caused by BaP alone; 4) TBT cotreatment brought an antagonistic effect on the BaP-mediated oxidative stress and DNA damage. These results suggested that toxicogenomic approach was available for analyzing combined toxicity with high sensitivity and accuracy, which might improve our understanding and predictability for the combined effects of chemicals. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. High-Density Real-Time PCR-Based in Vivo Toxicogenomic Screen to Predict Organ-Specific Toxicity

    PubMed Central

    Fabian, Gabriella; Farago, Nora; Feher, Liliana Z.; Nagy, Lajos I.; Kulin, Sandor; Kitajka, Klara; Bito, Tamas; Tubak, Vilmos; Katona, Robert L.; Tiszlavicz, Laszlo; Puskas, Laszlo G.

    2011-01-01

    Toxicogenomics, based on the temporal effects of drugs on gene expression, is able to predict toxic effects earlier than traditional technologies by analyzing changes in genomic biomarkers that could precede subsequent protein translation and initiation of histological organ damage. In the present study our objective was to extend in vivo toxicogenomic screening from analyzing one or a few tissues to multiple organs, including heart, kidney, brain, liver and spleen. Nanocapillary quantitative real-time PCR (QRT-PCR) was used in the study, due to its higher throughput, sensitivity and reproducibility, and larger dynamic range compared to DNA microarray technologies. Based on previous data, 56 gene markers were selected coding for proteins with different functions, such as proteins for acute phase response, inflammation, oxidative stress, metabolic processes, heat-shock response, cell cycle/apoptosis regulation and enzymes which are involved in detoxification. Some of the marker genes are specific to certain organs, and some of them are general indicators of toxicity in multiple organs. Utility of the nanocapillary QRT-PCR platform was demonstrated by screening different references, as well as discovery of drug-like compounds for their gene expression profiles in different organs of treated mice in an acute experiment. For each compound, 896 QRT-PCR were done: four organs were used from each of the treated four animals to monitor the relative expression of 56 genes. Based on expression data of the discovery gene set of toxicology biomarkers the cardio- and nephrotoxicity of doxorubicin and sulfasalazin, the hepato- and nephrotoxicity of rotenone, dihydrocoumarin and aniline, and the liver toxicity of 2,4-diaminotoluene could be confirmed. The acute heart and kidney toxicity of the active metabolite SN-38 from its less toxic prodrug, irinotecan could be differentiated, and two novel gene markers for hormone replacement therapy were identified, namely fabp4 and pparg

  1. 75 FR 1770 - An Approach to Using Toxicogenomic Data in U.S. EPA Human Health Risk Assessments: A Dibutyl...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-13

    ... qualitative aspects of the risk assessment because of the type of genomic data available for DBP. It is... Assessment (NCEA) within EPA's Office of Research and Development (ORD). Toxicogenomics is the application of... exploratory methods for analyzing genomic data for application to risk assessment and some preliminary results...

  2. USE OF TOXICOGENOMICS DATA IN RISK ASSESSMENT: CASE STUDY FOR A CHEMICAL IN THE ANDROGEN-MEDIATED MALE REPRODUCTIVE DEVELOPMENT TOXICITY PATHWAY

    EPA Science Inventory

    The goal of this project is to address the question, “Can existing toxicogenomics (TG) data improve Environmental Protection Agency (EPA) chemical health or risk assessments?” Although genomics data promises to impact multiple areas of science, medicine, law, and policy, there ar...

  3. Cross-Platform Toxicogenomics for the Prediction of Non-Genotoxic Hepatocarcinogenesis in Rat

    PubMed Central

    Metzger, Ute; Templin, Markus F.; Plummer, Simon; Ellinger-Ziegelbauer, Heidrun; Zell, Andreas

    2014-01-01

    In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens. PMID:24830643

  4. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database.

    PubMed

    Hayman, G Thomas; Laulederkind, Stanley J F; Smith, Jennifer R; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu. © The Author(s) 2016. Published by Oxford University Press.

  5. Toward a Public Toxicogenomics Capability for Supporting Predictive Toxicology: Survey of Current Resources and Chemical Indexing of Experiments in GEO and ArrayExpress

    EPA Science Inventory

    A publicly available toxicogenomics capability for supporting predictive toxicology and meta-analysis depends on availability of gene expression data for chemical treatment scenarios, the ability to locate and aggregate such information by chemical, and broad data coverage within...

  6. Discriminating between adaptive and carcinogenic liver hypertrophy in rat studies using logistic ridge regression analysis of toxicogenomic data: The mode of action and predictive models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu

    Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System weremore » used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic

  7. Toxicogenomic analysis of the hepatic effects of perfluorooctanoic acid on rare minnows (Gobiocypris rarus)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei Yanhong; Graduate School of the Chinese Academy of Sciences, Beijing, 100080; Liu Yang

    2008-02-01

    Perfluorooctanoic acid (PFOA) is a ubiquitous environmental contaminant that has been detected in a variety of terrestrial and aquatic organisms. To assess the effects of PFOA in fish and predict its potential mode of action, a toxicogenomic approach was applied to hepatic gene expression profile analysis in male and female rare minnows (Gobiocypris rarus) using a custom cDNA microarray containing 1773 unique genes. Rare minnows were treated with continuous flow-through exposure to PFOA at concentrations of 3, 10, and 30 mg/L for 28 days. Based on the observed histopathological changes, the livers from fish exposed to 10 mg/L PFOA weremore » selected for further hepatic gene expression analysis. While 124 and 171 genes were significantly altered by PFOA in males and females, respectively, of which 43 genes were commonly regulated in both sexes. The affected genes are involved in multiple biological processes, including lipid metabolism and transport, hormone action, immune responses, and mitochondrial functions. PFOA exposure significantly suppressed genes involved in fatty acid biosynthesis and transport but induced genes associated with intracellular trafficking of cholesterol. Alterations in expression of genes associated with mitochondrial fatty acid {beta}-oxidation were only observed in female rare minnows. In addition, PFOA inhibited genes responsible for thyroid hormone biosynthesis and significantly induced estrogen-responsive genes. These findings implicate PFOA in endocrine disruption. This work contributes not only to the elucidation of the potential mode of toxicity of PFOA to aquatic organisms but also to the use of toxicogenomic approaches to address issues in environmental toxicology.« less

  8. An Approach to Using Toxicogenomic Data in U.S. EPA Human Health Risk Assessments: A Dibutyl Phthalate Case Study (Final Report, 2010)

    EPA Science Inventory

    EPA announced the availability of the final report, An Approach to Using Toxicogenomic Data in U.S. EPA Human Health Risk Assessments: A Dibutyl Phthalate Case Study. This report outlines an approach to evaluate genomic data for use in risk assessment and a case study to ...

  9. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  10. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  11. A comparative cellular and molecular biology of longevity database.

    PubMed

    Stuart, Jeffrey A; Liang, Ping; Luo, Xuemei; Page, Melissa M; Gallagher, Emily J; Christoff, Casey A; Robb, Ellen L

    2013-10-01

    Discovering key cellular and molecular traits that promote longevity is a major goal of aging and longevity research. One experimental strategy is to determine which traits have been selected during the evolution of longevity in naturally long-lived animal species. This comparative approach has been applied to lifespan research for nearly four decades, yielding hundreds of datasets describing aspects of cell and molecular biology hypothesized to relate to animal longevity. Here, we introduce a Comparative Cellular and Molecular Biology of Longevity Database, available at ( http://genomics.brocku.ca/ccmbl/ ), as a compendium of comparative cell and molecular data presented in the context of longevity. This open access database will facilitate the meta-analysis of amalgamated datasets using standardized maximum lifespan (MLSP) data (from AnAge). The first edition contains over 800 data records describing experimental measurements of cellular stress resistance, reactive oxygen species metabolism, membrane composition, protein homeostasis, and genome homeostasis as they relate to vertebrate species MLSP. The purpose of this review is to introduce the database and briefly demonstrate its use in the meta-analysis of combined datasets.

  12. Discriminating between adaptive and carcinogenic liver hypertrophy in rat studies using logistic ridge regression analysis of toxicogenomic data: The mode of action and predictive models.

    PubMed

    Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi

    2017-03-01

    Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  14. MIPS PlantsDB: a database framework for comparative plant genome research

    PubMed Central

    Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  15. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  16. MODBASE, a database of annotated comparative protein structure models

    PubMed Central

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309

  17. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or

  18. The eNanoMapper database for nanomaterial safety information.

    PubMed

    Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

    2015-01-01

    The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly

  19. Risk assessment of Soulatrolide and Mammea (A/BA+A/BB) coumarins from Calophyllum brasiliense by a toxicogenomic and toxicological approach.

    PubMed

    Gomez-Verjan, J C; Estrella-Parra, E; Vazquez-Martinez, E R; Gonzalez-Sanchez, I; Guerrero-Magos, G; Mendoza-Villanueva, D; Isus, L; Alfaro, A; Cerbón-Cervantes, M; Aloy, P; Reyes-Chilpa, R

    2016-05-01

    Calophyllum brasiliense (Calophyllaceae) is a tropical rain forest tree distributed in Central and South America. It is an important source of tetracyclic dipyrano coumarins (Soulatrolide) and Mammea type coumarins. Soulatrolide is a potent inhibitor of HIV-1 reverse transcriptase and displays activity against Mycobacterium tuberculosis. Meanwhile, Mammea A/BA and A/BB, pure or as a mixture, are highly active against several human leukemia cell lines, Trypanosoma cruzi and Leishmania amazonensis. Nevertheless, there are few studies evaluating their safety profile. In the present work we performed toxicogenomic and toxicological analysis for both type of compounds. Soulatrolide, and the Mammea A/BA + A/BB mixture (2.1) were slightly toxic accordingly to Lorke assay classification (DL50 > 3000 mg/kg). After a short-term administration (100 mg/kg/daily, orally, 1 week) liver toxicogenomic analysis revealed 46 up and 72 downregulated genes for Mammea coumarins, and 665 up and 1077 downregulated genes for Soulatrolide. Gene enrichment analysis identified transcripts involved in drug metabolism for both compounds. In addition, network analysis through protein-protein interactions, tissue evaluation by TUNEL assay, and histological examination revealed no tissue damage on liver, kidney and spleen after treatments. Our results indicate that both type of coumarins displayed a safety profile, supporting their use in further preclinical studies to determine its therapeutic potential. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Application of dynamic topic models to toxicogenomics data.

    PubMed

    Lee, Mikyung; Liu, Zhichao; Huang, Ruili; Tong, Weida

    2016-10-06

    All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data. A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms. We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering

  1. Toxicogenomics and Cancer Susceptibility: Advances with Next-Generation Sequencing

    PubMed Central

    Ning, Baitang; Su, Zhenqiang; Mei, Nan; Hong, Huixiao; Deng, Helen; Shi, Leming; Fuscoe, James C.; Tolleson, William H.

    2017-01-01

    The aim of this review is to comprehensively summarize the recent achievements in the field of toxicogenomics and cancer research regarding genetic-environmental interactions in carcinogenesis and detection of genetic aberrations in cancer genomes by next-generation sequencing technology. Cancer is primarily a genetic disease in which genetic factors and environmental stimuli interact to cause genetic and epigenetic aberrations in human cells. Mutations in the germline act as either high-penetrance alleles that strongly increase the risk of cancer development, or as low-penetrance alleles that mildly change an individual’s susceptibility to cancer. Somatic mutations, resulting from either DNA damage induced by exposure to environmental mutagens or from spontaneous errors in DNA replication or repair are involved in the development or progression of the cancer. Induced or spontaneous changes in the epigenome may also drive carcinogenesis. Advances in next-generation sequencing technology provide us opportunities to accurately, economically, and rapidly identify genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution. Whole genome sequencing, whole exome sequencing, and RNA sequencing of paired cancer and adjacent normal tissue present a comprehensive picture of the cancer genome. These new findings should benefit public health by providing insights in understanding cancer biology, and in improving cancer diagnosis and therapy. PMID:24875441

  2. Does an Otolaryngology-Specific Database Have Added Value? A Comparative Feasibility Analysis.

    PubMed

    Bellmunt, Angela M; Roberts, Rhonda; Lee, Walter T; Schulz, Kris; Pynnonen, Melissa A; Crowson, Matthew G; Witsell, David; Parham, Kourosh; Langman, Alan; Vambutas, Andrea; Ryan, Sheila E; Shin, Jennifer J

    2016-07-01

    There are multiple nationally representative databases that support epidemiologic and outcomes research, and it is unknown whether an otolaryngology-specific resource would prove indispensable or superfluous. Therefore, our objective was to determine the feasibility of analyses in the National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) databases as compared with the otolaryngology-specific Creating Healthcare Excellence through Education and Research (CHEER) database. Parallel analyses in 2 data sets. Ambulatory visits in the United States. To test a fixed hypothesis that could be directly compared between data sets, we focused on a condition with expected prevalence high enough to substantiate availability in both. This query also encompassed a broad span of diagnoses to sample the breadth of available information. Specifically, we compared an assessment of suspected risk factors for sensorineural hearing loss in subjects 0 to 21 years of age, according to a predetermined protocol. We also assessed the feasibility of 6 additional diagnostic queries among all age groups. In the NAMCS/NHAMCS data set, the number of measured observations was not sufficient to support reliable numeric conclusions (percentage standard error among risk factors: 38.6-92.1). Analysis of the CHEER database demonstrated that age, sex, meningitis, and cytomegalovirus were statistically significant factors associated with pediatric sensorineural hearing loss (P < .01). Among the 6 additional diagnostic queries assessed, NAMCS/NHAMCS usage was also infeasible; the CHEER database contained 1585 to 212,521 more observations per annum. An otolaryngology-specific database has added utility when compared with already available national ambulatory databases. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2016.

  3. Use of genomic data in risk assessment case study: I. Evaluation of the dibutyl phthalate male reproductive development toxicity data set

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Makris, Susan L., E-mail: makris.susan@epa.gov; Euling, Susan Y.; Gray, L. Earl

    2013-09-15

    A case study was conducted, using dibutyl phthalate (DBP), to explore an approach to using toxicogenomic data in risk assessment. The toxicity and toxicogenomic data sets relative to DBP-related male reproductive developmental outcomes were considered conjointly to derive information about mode and mechanism of action. In this manuscript, we describe the case study evaluation of the toxicological database for DBP, focusing on identifying the full spectrum of male reproductive developmental effects. The data were assessed to 1) evaluate low dose and low incidence findings and 2) identify male reproductive toxicity endpoints without well-established modes of action (MOAs). These efforts ledmore » to the characterization of data gaps and research needs for the toxicity and toxicogenomic studies in a risk assessment context. Further, the identification of endpoints with unexplained MOAs in the toxicity data set was useful in the subsequent evaluation of the mechanistic information that the toxicogenomic data set evaluation could provide. The extensive analysis of the toxicology data set within the MOA context provided a resource of information for DBP in attempts to hypothesize MOAs (for endpoints without a well-established MOA) and to phenotypically anchor toxicogenomic and other mechanistic data both to toxicity endpoints and to available toxicogenomic data. This case study serves as an example of the steps that can be taken to develop a toxicological data source for a risk assessment, both in general and especially for risk assessments that include toxicogenomic data.« less

  4. Comparative analysis of perioperative complications between a multicenter prospective cervical deformity database and the Nationwide Inpatient Sample database.

    PubMed

    Passias, Peter G; Horn, Samantha R; Jalai, Cyrus M; Poorman, Gregory; Bono, Olivia J; Ramchandran, Subaraman; Smith, Justin S; Scheer, Justin K; Sciubba, Daniel M; Hamilton, D Kojo; Mundis, Gregory; Oh, Cheongeun; Klineberg, Eric O; Lafage, Virginie; Shaffrey, Christopher I; Ames, Christopher P

    2017-11-01

    Complication rates for adult cervical deformity are poorly characterized given the complexity and heterogeneity of cases. To compare perioperative complication rates following adult cervical deformity corrective surgery between a prospective multicenter database for patients with cervical deformity (PCD) and the Nationwide Inpatient Sample (NIS). Retrospective review of prospective databases. A total of 11,501 adult patients with cervical deformity (11,379 patients from the NIS and 122 patients from the PCD database). Perioperative medical and surgical complications. The NIS was queried (2001-2013) for cervical deformity discharges for patients ≥18 years undergoing cervical fusions using International Classification of Disease, Ninth Revision (ICD-9) coding. Patients ≥18 years from the PCD database (2013-2015) were selected. Equivalent complications were identified and rates were compared. Bonferroni correction (p<.004) was used for Pearson chi-square. Binary logistic regression was used to evaluate differences in complication rates between databases. A total of 11,379 patients from the NIS database and 122 patiens from the PCD database were identified. Patients from the PCD database were older (62.49 vs. 55.15, p<.001) but displayed similar gender distribution. Intraoperative complication rate was higher in the PCD (39.3%) group than in the NIS (9.2%, p<.001) database. The PCD database had an increased risk of reporting overall complications than the NIS (odds ratio: 2.81, confidence interval: 1.81-4.38). Only device-related complications were greater in the NIS (7.1% vs. 1.1%, p=.007). Patients from the PCD database displayed higher rates of the following complications: peripheral vascular (0.8% vs. 0.1%, p=.001), gastrointestinal (GI) (2.5% vs. 0.2%, p<.001), infection (8.2% vs. 0.5%, p<.001), dural tear (4.1% vs. 0.6%, p<.001), and dysphagia (9.8% vs. 1.9%, p<.001). Genitourinary, wound, and deep veinthrombosis (DVT) complications were similar between

  5. Application of toxicogenomic profiling to evaluate effects of benzene and formaldehyde: from yeast to human

    PubMed Central

    McHale, Cliona M.; Smith, Martyn T.; Zhang, Luoping

    2014-01-01

    Genetic variation underlies a significant proportion of the individual variation in human susceptibility to toxicants. The primary current approaches to identify gene–environment (GxE) associations, genome-wide association studies (GWAS) and candidate gene association studies, require large exposed and control populations and an understanding of toxicity genes and pathways, respectively. This limits their application in the study of GxE associations for the leukemogens benzene and formaldehyde, whose toxicity has long been a focus of our research. As an alternative approach, we applied innovative in vitro functional genomics testing systems, including unbiased functional screening assays in yeast and a near-haploid human bone marrow cell line (KBM7). Through comparative genomic and computational analyses of the resulting data, we have identified human genes and pathways that may modulate susceptibility to benzene and formaldehyde. We have validated the roles of several genes in mammalian cell models. In populations occupationally exposed to low levels of benzene, we applied peripheral blood mononuclear cell transcriptomics and chromosome-wide aneuploidy studies (CWAS) in lymphocytes. In this review of the literature, we describe our comprehensive toxicogenomic approach and the potential mechanisms of toxicity and susceptibility genes identified for benzene and formaldehyde, as well as related studies conducted by other researchers. PMID:24571325

  6. Comparing surgical infections in National Surgical Quality Improvement Project and an Institutional Database.

    PubMed

    Selby, Luke V; Sjoberg, Daniel D; Cassella, Danielle; Sovel, Mindy; Weiser, Martin R; Sepkowitz, Kent; Jones, David R; Strong, Vivian E

    2015-06-15

    Surgical quality improvement requires accurate tracking and benchmarking of postoperative adverse events. We track surgical site infections (SSIs) with two systems; our in-house surgical secondary events (SSE) database and the National Surgical Quality Improvement Project (NSQIP). The SSE database, a modification of the Clavien-Dindo classification, categorizes SSIs by their anatomic site, whereas NSQIP categorizes by their level. Our aim was to directly compare these different definitions. NSQIP and the SSE database entries for all surgeries performed in 2011 and 2012 were compared. To match NSQIP definitions, and while blinded to NSQIP results, entries in the SSE database were categorized as either incisional (superficial or deep) or organ space infections. These categorizations were compared with NSQIP records; agreement was assessed with Cohen kappa. The 5028 patients in our cohort had a 6.5% SSI in the SSE database and a 4% rate in NSQIP, with an overall agreement of 95% (kappa = 0.48, P < 0.0001). The rates of categorized infections were similarly well matched; incisional rates of 4.1% and 2.7% for the SSE database and NSQIP and organ space rates of 2.6% and 1.5%. Overall agreements were 96% (kappa = 0.36, P < 0.0001) and 98% (kappa = 0.55, P < 0.0001), respectively. Over 80% of cases recorded by the SSE database but not NSQIP did not meet NSQIP criteria. The SSE database is an accurate, real-time record of postoperative SSIs. Institutional databases that capture all surgical cases can be used in conjunction with NSQIP with excellent concordance. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. The eNanoMapper database for nanomaterial safety information

    PubMed Central

    Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

    2015-01-01

    Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer

  8. Toxicogenomic assessment of 6-OH-BDE47 induced ...

    EPA Pesticide Factsheets

    Hydroxylated and methoxylated polybrominated diphenyl ethers (OH-/MeO-PBDEs) are analogs of PBDEs with hundreds of possible structures and many of them can activate aryl hydrocarbon receptor (AhR), however, the in vivo evidence on the toxicity of OH-/MeO-PBDEs are still very limited. 6-OH-BDE47 is a relatively potent AhR activator and a predominant congener of OH-PBDEs detected in the environment. Here the developmental toxicity of 6-OH-BDE47 in chicken embryos was assessed using a toxicogenomic approach. Fertilized chicken eggs were dosed via in ovo administration of 0.006 to 0.474 nmol 6-OH-BDE47/g egg followed by 18-days incubation. Significant embryo mortality (LD50=0.294 pmol/g egg) and increased hepatic somatic index (HSI) were caused by 6-OH-BDE47 exposure. The functional enrichment of differentially expressed genes (DEGs) associated with oxidative phosphorylation, generation of precursor metabolites and energy, and electron transport chain suggest that 6-OH-BDE47 exposure may disrupt the embryo development by altering the function of energy production in mitochondrion. Moreover, AhR mediated responses including up-regulation of CYP1A4 was observed in the livers of embryos exposed to 6-OH-BDE47. Overall, this study confirmed the prediction of embryo lethality by 6-OH-BDE47 consistent with an adverse outcome pathway (AOP) linking AhR activation to embryo lethality. The results provide an example of application of AOP in the hazard and ecological risk asse

  9. Big Data and Total Hip Arthroplasty: How Do Large Databases Compare?

    PubMed

    Bedard, Nicholas A; Pugely, Andrew J; McHugh, Michael A; Lux, Nathan R; Bozic, Kevin J; Callaghan, John J

    2018-01-01

    Use of large databases for orthopedic research has become extremely popular in recent years. Each database varies in the methods used to capture data and the population it represents. The purpose of this study was to evaluate how these databases differed in reported demographics, comorbidities, and postoperative complications for primary total hip arthroplasty (THA) patients. Primary THA patients were identified within National Surgical Quality Improvement Programs (NSQIP), Nationwide Inpatient Sample (NIS), Medicare Standard Analytic Files (MED), and Humana administrative claims database (HAC). NSQIP definitions for comorbidities and complications were matched to corresponding International Classification of Diseases, 9th Revision/Current Procedural Terminology codes to query the other databases. Demographics, comorbidities, and postoperative complications were compared. The number of patients from each database was 22,644 in HAC, 371,715 in MED, 188,779 in NIS, and 27,818 in NSQIP. Age and gender distribution were clinically similar. Overall, there was variation in prevalence of comorbidities and rates of postoperative complications between databases. As an example, NSQIP had more than twice the obesity than NIS. HAC and MED had more than 2 times the diabetics than NSQIP. Rates of deep infection and stroke 30 days after THA had more than 2-fold difference between all databases. Among databases commonly used in orthopedic research, there is considerable variation in complication rates following THA depending upon the database used for analysis. It is important to consider these differences when critically evaluating database research. Additionally, with the advent of bundled payments, these differences must be considered in risk adjustment models. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Information Literacy Skills: Comparing and Evaluating Databases

    ERIC Educational Resources Information Center

    Grismore, Brian A.

    2012-01-01

    The purpose of this database comparison is to express the importance of teaching information literacy skills and to apply those skills to commonly used Internet-based research tools. This paper includes a comparison and evaluation of three databases (ProQuest, ERIC, and Google Scholar). It includes strengths and weaknesses of each database based…

  11. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  12. rCAD: A Novel Database Schema for the Comparative Analysis of RNA.

    PubMed

    Ozer, Stuart; Doshi, Kishore J; Xu, Weijia; Gutell, Robin R

    2011-12-31

    Beyond its direct involvement in protein synthesis with mRNA, tRNA, and rRNA, RNA is now being appreciated for its significance in the overall metabolism and regulation of the cell. Comparative analysis has been very effective in the identification and characterization of RNA molecules, including the accurate prediction of their secondary structure. We are developing an integrative scalable data management and analysis system, the RNA Comparative Analysis Database (rCAD), implemented with SQL Server to support RNA comparative analysis. The platformagnostic database schema of rCAD captures the essential relationships between the different dimensions of information for RNA comparative analysis datasets. The rCAD implementation enables a variety of comparative analysis manipulations with multiple integrated data dimensions for advanced RNA comparative analysis workflows. In this paper, we describe details of the rCAD schema design and illustrate its usefulness with two usage scenarios.

  13. rCAD: A Novel Database Schema for the Comparative Analysis of RNA

    PubMed Central

    Ozer, Stuart; Doshi, Kishore J.; Xu, Weijia; Gutell, Robin R.

    2013-01-01

    Beyond its direct involvement in protein synthesis with mRNA, tRNA, and rRNA, RNA is now being appreciated for its significance in the overall metabolism and regulation of the cell. Comparative analysis has been very effective in the identification and characterization of RNA molecules, including the accurate prediction of their secondary structure. We are developing an integrative scalable data management and analysis system, the RNA Comparative Analysis Database (rCAD), implemented with SQL Server to support RNA comparative analysis. The platformagnostic database schema of rCAD captures the essential relationships between the different dimensions of information for RNA comparative analysis datasets. The rCAD implementation enables a variety of comparative analysis manipulations with multiple integrated data dimensions for advanced RNA comparative analysis workflows. In this paper, we describe details of the rCAD schema design and illustrate its usefulness with two usage scenarios. PMID:24772454

  14. Toxicogenomic effects common to triazole antifungals and conserved between rats and humans.

    PubMed

    Goetz, Amber K; Dix, David J

    2009-07-01

    The triazole antifungals myclobutanil, propiconazole and triadimefon cause varying degrees of hepatic toxicity and disrupt steroid hormone homeostasis in rodent in vivo models. To identify biological pathways consistently modulated across multiple timepoints and various study designs, gene expression profiling was conducted on rat livers from three separate studies with triazole treatment groups ranging from 6 h after a single oral gavage exposure, to prenatal to adult exposures via feed. To explore conservation of responses across species, gene expression from the rat liver studies were compared to in vitro data from rat and human primary hepatocytes exposed to the triazoles. Toxicogenomic data on triazoles from 33 different treatment groups and 135 samples (microarrays) identified thousands of probe sets and dozens of pathways differentially expressed across time, dose, and species--many of these were common to all three triazoles, or conserved between rodents and humans. Common and conserved pathways included androgen and estrogen metabolism, xenobiotic metabolism signaling through CAR and PXR, and CYP mediated metabolism. Differentially expressed genes included the Phase I xenobiotic, fatty acid, sterol and steroid metabolism genes Cyp2b2 and CYP2B6, Cyp3a1 and CYP3A4, and Cyp4a22 and CYP4A11; Phase II conjugation enzyme genes Ugt1a1 and UGT1A1; and Phase III ABC transporter genes Abcb1 and ABCB1. Gene expression changes caused by all three triazoles in liver and hepatocytes were concentrated in biological pathways regulating lipid, sterol and steroid homeostasis, identifying a potential common mode of action conserved between rodents and humans. Modulation of hepatic sterol and steroid metabolism is a plausible mode of action for changes in serum testosterone and adverse reproductive outcomes observed in rat studies, and may be relevant to human risk assessment.

  15. A comparative study of six European databases of medically oriented Web resources.

    PubMed

    Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes

    2005-10-01

    The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.

  16. A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics

    PubMed Central

    House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.

    2017-01-01

    Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636

  17. Entitymetrics: Measuring the Impact of Entities

    PubMed Central

    Ding, Ying; Song, Min; Han, Jia; Yu, Qi; Yan, Erjia; Lin, Lili; Chambers, Tamy

    2013-01-01

    This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the centrality ranks of biological entities with results from Comparative Toxicogenomics Database (CTD). The comparison demonstrates the usefulness of entitymetrics to detect most of the outstanding interactions manually curated in CTD. PMID:24009660

  18. BμG@Sbase—a microbial gene expression and comparative genomic database

    PubMed Central

    Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792

  19. BμG@Sbase--a microbial gene expression and comparative genomic database.

    PubMed

    Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.

  20. Can different primary care databases produce comparable estimates of burden of disease: results of a study exploring venous leg ulceration.

    PubMed

    Petherick, Emily S; Pickett, Kate E; Cullum, Nicky A

    2015-08-01

    Primary care databases from the UK have been widely used to produce evidence on the epidemiology and health service usage of a wide range of conditions. To date there have been few evaluations of the comparability of estimates between different sources of these data. To estimate the comparability of two widely used primary care databases, the Health Improvement Network Database (THIN) and the General Practice Research Database (GPRD) using venous leg ulceration as an exemplar condition. Cross prospective cohort comparison. GPRD and the THIN databases using data from 1998 to 2006. A data set was extracted from both databases containing all cases of persons aged 20 years or greater with a database diagnosis of venous leg ulceration recorded in the databases for the period 1998-2006. Annual rates of incidence and prevalence of venous leg ulceration were calculated within each database and standardized to the European standard population and compared using standardized rate ratios. Comparable estimates of venous leg ulcer incidence from the GPRD and THIN databases could be obtained using data from 2000 to 2006 and of prevalence using data from 2001 to 2006. Recent data collected by these two databases are more likely to produce comparable results of the burden venous leg ulceration. These results require confirmation in other disease areas to enable researchers to have confidence in the comparability of findings from these two widely used primary care research resources. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Assessment of Confounders in Comparative Effectiveness Studies From Secondary Databases.

    PubMed

    Franklin, Jessica M; Schneeweiss, Sebastian; Solomon, Daniel H

    2017-03-15

    Secondary clinical databases are an important and growing source of data for comparative effectiveness research (CER) studies. However, measurement of confounders, such as biomarker values or patient-reported health status, in secondary clinical databases may not align with the initiation of a new treatment. In many published CER analyses of registry data, investigators assessed confounders based on the first questionnaire in which the new exposure was recorded. However, it is known that adjustment for confounders measured after the start of exposure can lead to biased treatment effect estimates. In the present study, we conducted simulations to compare assessment strategies for a dynamic clinical confounder in a registry-based comparative effectiveness study of 2 therapies. As expected, we found that adjustment for the confounder value at the time of the first questionnaire after the start of exposure creates a biased estimate the total effect of exposure choice on outcome when the confounder mediates part of the effect. However, adjustment for the prior value can also be badly biased when measured long before exposure initiation. Thus, investigators should carefully consider the timing of confounder measurements relative to exposure initiation and the rate of change in the confounder in order to choose the most relevant measure for each patient. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. A Comparative Analysis Among the SRS M&M, NIS, and KID Databases for the Adolescent Idiopathic Scoliosis.

    PubMed

    Lee, Nathan J; Guzman, Javier Z; Kim, Jun; Skovrlj, Branko; Martin, Christopher T; Pugely, Andrew J; Gao, Yubo; Caridi, John M; Mendoza-Lattes, Sergio; Cho, Samuel K

    2016-11-01

    Retrospective cohort analysis. A growing number of publications have utilized the Scoliosis Research Society (SRS) Morbidity and Mortality (M&M) database, but none have compared it to other large databases. The objective of this study was to compare SRS complications with those in administrative databases. The Nationwide Inpatient Sample (NIS) and Kid's Inpatient Database (KID) captured a greater number of overall complications while the SRS M&M data provided a greater incidence of spine-related complications following adolescent idiopathic scoliosis (AIS) surgery. Chi-square was used to obtain statistical significance, with p < .05 considered significant. The SRS 2004-2007 (9,904 patients), NIS 2004-2007 (20,441 patients) and KID 2003-2006 (10,184 patients) databases were analyzed for AIS patients who underwent fusion. Comparable variables were queried in all three databases, including patient demographics, surgical variables, and complications. Patients undergoing AIS in the SRS database were slightly older (SRS 14.4 years vs. NIS 13.8 years, p < .0001; KID 13.9 years, p < .0001) and less likely to be male (SRS 18.5% vs. NIS 26.3%, p < .0001; KID 24.8%, p < .0001). Revision surgery (SRS 3.3% vs. NIS 2.4%, p < .0001; KID 0.9%, p < .0001) and osteotomy (SRS 8% vs. NIS 2.3%, p < .0001; KID 2.4%, p < .0001) were more commonly reported in the SRS database. The SRS database reported fewer overall complications (SRS 3.9% vs. NIS 7.3%, p < .0001; KID 6.6%, p < .0001). However, when respiratory complications (SRS 0.5% vs. NIS 3.7%, p < .0001; KID 4.4%, p < .0001) were excluded, medical complication rates were similar across databases. In contrast, SRS reported higher spine-specific complication rates. Mortality rates were similar between SRS versus NIS (p = .280) and SRS versus KID (p = .08) databases. There are similarities and differences between the three databases. These discrepancies are likely due to the varying data-gathering methods each organization uses to

  3. An approach for integrating toxicogenomic data in risk assessment: The dibutyl phthalate case study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Euling, Susan Y., E-mail: euling.susan@epa.gov; Thompson, Chad M.; Chiu, Weihsueh A.

    An approach for evaluating and integrating genomic data in chemical risk assessment was developed based on the lessons learned from performing a case study for the chemical dibutyl phthalate. A case study prototype approach was first developed in accordance with EPA guidance and recommendations of the scientific community. Dibutyl phthalate (DBP) was selected for the case study exercise. The scoping phase of the dibutyl phthalate case study was conducted by considering the available DBP genomic data, taken together with the entire data set, for whether they could inform various risk assessment aspects, such as toxicodynamics, toxicokinetics, and dose–response. A descriptionmore » of weighing the available dibutyl phthalate data set for utility in risk assessment provides an example for considering genomic data for future chemical assessments. As a result of conducting the scoping process, two questions—Do the DBP toxicogenomic data inform 1) the mechanisms or modes of action?, and 2) the interspecies differences in toxicodynamics?—were selected to focus the case study exercise. Principles of the general approach include considering the genomics data in conjunction with all other data to determine their ability to inform the various qualitative and/or quantitative aspects of risk assessment, and evaluating the relationship between the available genomic and toxicity outcome data with respect to study comparability and phenotypic anchoring. Based on experience from the DBP case study, recommendations and a general approach for integrating genomic data in chemical assessment were developed to advance the broader effort to utilize 21st century data in risk assessment. - Highlights: • Performed DBP case study for integrating genomic data in risk assessment • Present approach for considering genomic data in chemical risk assessment • Present recommendations for use of genomic data in chemical risk assessment.« less

  4. The ToxCast Pathway Database for Identifying Toxicity Signatures and Potential Modes of Action from Chemical Screening Data

    EPA Science Inventory

    The U.S. Environmental Protection Agency (EPA), through its ToxCast program, is developing predictive toxicity approaches that will use in vitro high-throughput screening (HTS), high-content screening (HCS) and toxicogenomic data to predict in vivo toxicity phenotypes. There are ...

  5. Response of human renal tubular cells to cyclosporine and sirolimus: A toxicogenomic study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pallet, Nicolas; Rabant, Marion; Xu-Dubois, Yi-Chun

    The molecular mechanisms involved in the potentially nephrotoxic response of tubular cells to immunosuppressive drugs remain poorly understood. Transcriptional profiles of human proximal tubular cells exposed to cyclosporine A (CsA), sirolimus (SRL) or their combination, were established using oligonucleotide microarrays. Hierarchical clustering of genes implicated in fibrotic processes showed a clear distinction between expression profiles with CsA and CsA + SRL treatments on the one hand and SRL treatment on the other. Functional analysis found that CsA and CsA + SRL treatments preferentially alter biological processes located at the cell membrane, such as ion transport or signal transduction, whereas SRLmore » modifies biological processes within the nucleus and related to transcriptional activity. Genome wide expression analysis suggested that CsA may induce an endoplasmic reticulum (ER) stress in tubular cells in vitro. Moreover we found that CsA exposure in vivo is associated with the upregulation of the ER stress marker BIP in kidney transplant biopsies. In conclusion, this toxicogenomic study highlights the molecular interaction networks that may contribute to the tubular response to CsA and SRL. These results may also offer a new working hypothesis for future research in the field of CsA nephrotoxicity. Further studies are needed to evaluate if ER stress detection in tubular cells in human biopsies can predict CsA nephrotoxicity.« less

  6. Information Retrieval in Telemedicine: a Comparative Study on Bibliographic Databases

    PubMed Central

    Ahmadi, Maryam; Sarabi, Roghayeh Ershad; Orak, Roohangiz Jamshidi; Bahaadinbeigy, Kambiz

    2015-01-01

    Background and Aims: The first step in each systematic review is selection of the most valid database that can provide the highest number of relevant references. This study was carried out to determine the most suitable database for information retrieval in telemedicine field. Methods: Cinhal, PubMed, Web of Science and Scopus databases were searched for telemedicine matched with Education, cost benefit and patient satisfaction. After analysis of the obtained results, the accuracy coefficient, sensitivity, uniqueness and overlap of databases were calculated. Results: The studied databases differed in the number of retrieved articles. PubMed was identified as the most suitable database for retrieving information on the selected topics with the accuracy and sensitivity ratios of 50.7% and 61.4% respectively. The uniqueness percent of retrieved articles ranged from 38% for Pubmed to 3.0% for Cinhal. The highest overlap rate (18.6%) was found between PubMed and Web of Science. Less than 1% of articles have been indexed in all searched databases. Conclusion: PubMed is suggested as the most suitable database for starting search in telemedicine and after PubMed, Scopus and Web of Science can retrieve about 90% of the relevant articles. PMID:26236086

  7. Information Retrieval in Telemedicine: a Comparative Study on Bibliographic Databases.

    PubMed

    Ahmadi, Maryam; Sarabi, Roghayeh Ershad; Orak, Roohangiz Jamshidi; Bahaadinbeigy, Kambiz

    2015-06-01

    The first step in each systematic review is selection of the most valid database that can provide the highest number of relevant references. This study was carried out to determine the most suitable database for information retrieval in telemedicine field. Cinhal, PubMed, Web of Science and Scopus databases were searched for telemedicine matched with Education, cost benefit and patient satisfaction. After analysis of the obtained results, the accuracy coefficient, sensitivity, uniqueness and overlap of databases were calculated. The studied databases differed in the number of retrieved articles. PubMed was identified as the most suitable database for retrieving information on the selected topics with the accuracy and sensitivity ratios of 50.7% and 61.4% respectively. The uniqueness percent of retrieved articles ranged from 38% for Pubmed to 3.0% for Cinhal. The highest overlap rate (18.6%) was found between PubMed and Web of Science. Less than 1% of articles have been indexed in all searched databases. PubMed is suggested as the most suitable database for starting search in telemedicine and after PubMed, Scopus and Web of Science can retrieve about 90% of the relevant articles.

  8. Incidence of Appendicitis over Time: A Comparative Analysis of an Administrative Healthcare Database and a Pathology-Proven Appendicitis Registry

    PubMed Central

    Clement, Fiona; Zimmer, Scott; Dixon, Elijah; Ball, Chad G.; Heitman, Steven J.; Swain, Mark; Ghosh, Subrata

    2016-01-01

    Importance At the turn of the 21st century, studies evaluating the change in incidence of appendicitis over time have reported inconsistent findings. Objectives We compared the differences in the incidence of appendicitis derived from a pathology registry versus an administrative database in order to validate coding in administrative databases and establish temporal trends in the incidence of appendicitis. Design We conducted a population-based comparative cohort study to identify all individuals with appendicitis from 2000 to2008. Setting & Participants Two population-based data sources were used to identify cases of appendicitis: 1) a pathology registry (n = 8,822); and 2) a hospital discharge abstract database (n = 10,453). Intervention & Main Outcome The administrative database was compared to the pathology registry for the following a priori analyses: 1) to calculate the positive predictive value (PPV) of administrative codes; 2) to compare the annual incidence of appendicitis; and 3) to assess differences in temporal trends. Temporal trends were assessed using a generalized linear model that assumed a Poisson distribution and reported as an annual percent change (APC) with 95% confidence intervals (CI). Analyses were stratified by perforated and non-perforated appendicitis. Results The administrative database (PPV = 83.0%) overestimated the incidence of appendicitis (100.3 per 100,000) when compared to the pathology registry (84.2 per 100,000). Codes for perforated appendicitis were not reliable (PPV = 52.4%) leading to overestimation in the incidence of perforated appendicitis in the administrative database (34.8 per 100,000) as compared to the pathology registry (19.4 per 100,000). The incidence of appendicitis significantly increased over time in both the administrative database (APC = 2.1%; 95% CI: 1.3, 2.8) and pathology registry (APC = 4.1; 95% CI: 3.1, 5.0). Conclusion & Relevance The administrative database overestimated the incidence of appendicitis

  9. Coverage and overlaps in bibliographic databases relevant to forensic medicine: a comparative analysis of MEDLINE.

    PubMed Central

    Yonker, V A; Young, K P; Beecham, S K; Horwitz, S; Cousin, K

    1990-01-01

    This study was designed to make a comparative evaluation of the performance of MEDLINE in covering serial literature. Forensic medicine was chosen because it is an interdisciplinary subject area that would test MEDLARS at the periphery of the system. The evaluation of database coverage was based upon articles included in the bibliographies of scholars in the field of forensic medicine. This method was considered appropriate for characterizing work used by researchers in this field. The results of comparing MEDLINE to other databases evoked some concerns about the selective indexing policy of MEDLINE in serving the interests of those working in forensic medicine. PMID:2403829

  10. Validity of cancer diagnosis in the National Health Insurance database compared with the linked National Cancer Registry in Taiwan.

    PubMed

    Kao, Wei-Heng; Hong, Ji-Hong; See, Lai-Chu; Yu, Huang-Ping; Hsu, Jun-Te; Chou, I-Jun; Chou, Wen-Chi; Chiou, Meng-Jiun; Wang, Chun-Chieh; Kuo, Chang-Fu

    2017-08-16

    We aimed to evaluate the validity of cancer diagnosis in the National Health Insurance (NHI) database, which has routinely collected the health information of almost the entire Taiwanese population since 1995, compared with the Taiwan National Cancer Registry (NCR). There were 26,542,445 active participants registered in the NHI database between 2001 and 2012. National Cancer Registry and NHI database records were compared for cancer diagnosis; date of cancer diagnosis; and 1, 2, and 5 year survival. In addition, the 10 leading causes of cancer deaths in Taiwan were analyzed. There were 908,986 cancer diagnoses in NCR and NHI database and 782,775 (86.1%) in both, with 53,192 (5.9%) in the NHI database only and 73,019 (8.0%) in the NCR only. The positive predictive value of the NHI database cancer diagnoses was 94% for all cancers; the positive predictive value of the 10 specific cancers ranged from 95% (lung cancer) to 82% (cervical cancer). The date of diagnosis in the NHI database was generally delayed by a median of 15 days (interquartile range 8-18) compared with the NCR. The 1, 2, and 5 year survival rates were 71.21%, 60.85%, and 47.44% using the NHI database and were 71.18%, 60.17%, and 46.09% using NCR data. Recording of cancer diagnoses and survival estimates based on these diagnosis codes in the NHI database are generally consistent with the NCR. Studies using NHI database data must pay careful attention to eligibility and record linkage; use of both sources is recommended. Copyright © 2017 John Wiley & Sons, Ltd.

  11. Toxicogenomic analysis identifies the apoptotic pathway as the main cause of hepatotoxicity induced by tributyltin.

    PubMed

    Zhou, Mi; Feng, Mei; Fu, Ling-Ling; Ji, Lin-Dan; Zhao, Jin-Shun; Xu, Jin

    2016-11-01

    Tributyltin (TBT) is one of the most widely used organotin biocides, which has severe endocrine-disrupting effects on marine species and mammals. Given that TBT accumulates at higher levels in the liver than in any other organ, and it acts mainly as a hepatotoxic agent, it is important to clearly delineate the hepatotoxicity of TBT. However, most of the available studies on TBT have focused on observations at the cellular level, while studies at the level of genes and proteins are limited; therefore, the molecular mechanisms of TBT-induced hepatotoxicity remains largely unclear. In the present study, we applied a toxicogenomic approach to investigate the effects of TBT on gene expression in the human normal liver cell line HL7702. Gene expression profiling identified the apoptotic pathway as the major cause of hepatotoxicity induced by TBT. Flow cytometry assays confirmed that medium- and high-dose TBT treatments significantly increased the number of apoptotic cells, and more cells underwent late apoptosis in the high-dose TBT group. The genes encoding heat shock proteins (HSPs), kinases and tumor necrosis factor receptors mediated TBT-induced apoptosis. These findings revealed novel molecular mechanisms of TBT-induced hepatotoxicity, and the current microarray data may also provide clues for future studies. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Toxicogenomic analysis of N-nitrosomorpholine induced changes in rat liver: Comparison of genomic and proteomic responses and anchoring to histopathological parameters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oberemm, A., E-mail: axel.oberemm@bfr.bund.d; Ahr, H.-J.; Bannasch, P.

    2009-12-01

    A common animal model of chemical hepatocarcinogenesis was used to examine the utility of transcriptomic and proteomic data to identify early biomarkers related to chemically induced carcinogenesis. N-nitrosomorpholine, a frequently used genotoxic model carcinogen, was applied via drinking water at 120 mg/L to male Wistar rats for 7 weeks followed by an exposure-free period of 43 weeks. Seven specimens of each treatment group (untreated control and 120 mg/L N-nitrosomorpholine in drinking water) were sacrificed at nine time points during and after N-nitrosomorpholine treatment. Individual samples from the liver were prepared for histological and toxicogenomic analyses. For histological detection of preneoplasticmore » and neoplastic tissue areas, sections were stained using antibodies against the placental form of glutathione-S-transferase (GST-P). Gene and protein expression profiles of liver tissue homogenates were analyzed using RG-U34A Affymetrix rat gene chips and two-dimensional gel electrophoresis-based proteomics, respectively. In order to compare results obtained by histopathology, transcriptomics and proteomics, GST-P-stained liver sections were evaluated morphometrically, which revealed a parallel time course of the area fraction of preneoplastic lesions and gene plus protein expression patterns. On the transcriptional level, an increase of hepatic GST-P expression was detectable as early as 3 weeks after study onset. Comparing deregulated genes and proteins, eight species were identified which showed a corresponding expression profile on both expression levels. Functional analysis suggests that these genes and corresponding proteins may be useful as biomarkers of early hepatocarcinogenesis.« less

  13. Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.

    PubMed

    Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida

    2014-09-15

    Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

  14. A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.

    PubMed

    Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin

    2015-12-01

    Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.

  15. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

    PubMed

    Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

    2008-11-27

    The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

  16. Mechanism-based risk assessment strategy for drug-induced cholestasis using the transcriptional benchmark dose derived by toxicogenomics.

    PubMed

    Kawamoto, Taisuke; Ito, Yuichi; Morita, Osamu; Honda, Hiroshi

    2017-01-01

    Cholestasis is one of the major causes of drug-induced liver injury (DILI), which can result in withdrawal of approved drugs from the market. Early identification of cholestatic drugs is difficult due to the complex mechanisms involved. In order to develop a strategy for mechanism-based risk assessment of cholestatic drugs, we analyzed gene expression data obtained from the livers of rats that had been orally administered with 12 known cholestatic compounds repeatedly for 28 days at three dose levels. Qualitative analyses were performed using two statistical approaches (hierarchical clustering and principle component analysis), in addition to pathway analysis. The transcriptional benchmark dose (tBMD) and tBMD 95% lower limit (tBMDL) were used for quantitative analyses, which revealed three compound sub-groups that produced different types of differential gene expression; these groups of genes were mainly involved in inflammation, cholesterol biosynthesis, and oxidative stress. Furthermore, the tBMDL values for each test compound were in good agreement with the relevant no observed adverse effect level. These results indicate that our novel strategy for drug safety evaluation using mechanism-based classification and tBMDL would facilitate the application of toxicogenomics for risk assessment of cholestatic DILI.

  17. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Comparative effectiveness analysis of anticoagulant strategies in a large observational database of percutaneous coronary interventions.

    PubMed

    Wise, Gregory R; Schwartz, Brian P; Dittoe, Nathaniel; Safar, Ammar; Sherman, Steven; Bowdy, Bruce; Hahn, Harvey S

    2012-06-01

    Percutaneous coronary intervention (PCI) is the most commonly used procedure for coronary revascularization. There are multiple adjuvant anticoagulation strategies available. In this era of cost containment, we performed a comparative effectiveness analysis of clinical outcomes and cost of the major anticoagulant strategies across all types of PCI procedures in a large observational database. A retrospective, comparative effectiveness analysis of the Premier observational database was conducted to determine the impact of anticoagulant treatment on outcomes. Multiple linear regression and logistic regression models were used to assess the association of initial antithrombotic treatment with outcomes while controlling for other factors. A total of 458,448 inpatient PCI procedures with known antithrombotic regimen from 299 hospitals between January 1, 2004 and March 31, 2008 were identified. Compared to patients treated with heparin plus glycoprotein IIb/IIIa inhibitor (GPI), bivalirudin was associated with a 41% relative risk reduction (RRR) for inpatient mortality, a 44% RRR for clinically apparent bleeding, and a 37% RRR for any transfusion. Furthermore, treatment with bivalirudin alone resulted in a cost savings of $976 per case. Similar results were seen between bivalirudin and heparin in all end-points. Combined use of both bivalirudin and GPI substantially attenuated the cost benefits demonstrated with bivalirudin alone. Bivalirudin use was associated with both improved clinical outcomes and decreased hospital costs in this large "real-world" database. To our knowledge, this study is the first to demonstrate the ideal comparative effectiveness end-point of both improved clinical outcomes with decreased costs in PCI. ©2012, Wiley Periodicals, Inc.

  19. A Drosophila model for toxicogenomics: Genetic variation in susceptibility to heavy metal exposure

    PubMed Central

    Luoma, Sarah E.; St. Armour, Genevieve E.; Thakkar, Esha

    2017-01-01

    The genetic factors that give rise to variation in susceptibility to environmental toxins remain largely unexplored. Studies on genetic variation in susceptibility to environmental toxins are challenging in human populations, due to the variety of clinical symptoms and difficulty in determining which symptoms causally result from toxic exposure; uncontrolled environments, often with exposure to multiple toxicants; and difficulty in relating phenotypic effect size to toxic dose, especially when symptoms become manifest with a substantial time lag. Drosophila melanogaster is a powerful model that enables genome-wide studies for the identification of allelic variants that contribute to variation in susceptibility to environmental toxins, since the genetic background, environmental rearing conditions and toxic exposure can be precisely controlled. Here, we used extreme QTL mapping in an outbred population derived from the D. melanogaster Genetic Reference Panel to identify alleles associated with resistance to lead and/or cadmium, two ubiquitous environmental toxins that present serious health risks. We identified single nucleotide polymorphisms (SNPs) associated with variation in resistance to both heavy metals as well as SNPs associated with resistance specific to each of them. The effects of these SNPs were largely sex-specific. We applied mutational and RNAi analyses to 33 candidate genes and functionally validated 28 of them. We constructed networks of candidate genes as blueprints for orthologous networks of human genes. The latter not only provided functional contexts for known human targets of heavy metal toxicity, but also implicated novel candidate susceptibility genes. These studies validate Drosophila as a translational toxicogenomics gene discovery system. PMID:28732062

  20. Inter-Annual Variability of the Acoustic Propagation in the Mediterranean Sea Identified from a Synoptic Monthly Gridded Database as Compared with GDEM

    DTIC Science & Technology

    2016-12-01

    VARIABILITY OF THE ACOUSTIC PROPAGATION IN THE MEDITERRANEAN SEA IDENTIFIED FROM A SYNOPTIC MONTHLY GRIDDED DATABASE AS COMPARED WITH GDEM by...ANNUAL VARIABILITY OF THE ACOUSTIC PROPAGATION IN THE MEDITERRANEAN SEA IDENTIFIED FROM A SYNOPTIC MONTHLY GRIDDED DATABASE AS COMPARED WITH GDEM 5...profiles obtained from the synoptic monthly gridded World Ocean Database (SMD-WOD) and Generalized Digital Environmental Model (GDEM) temperature (T

  1. Quality standards for real-world research. Focus on observational database studies of comparative effectiveness.

    PubMed

    Roche, Nicolas; Reddel, Helen; Martin, Richard; Brusselle, Guy; Papi, Alberto; Thomas, Mike; Postma, Dirjke; Thomas, Vicky; Rand, Cynthia; Chisholm, Alison; Price, David

    2014-02-01

    Real-world research can use observational or clinical trial designs, in both cases putting emphasis on high external validity, to complement the classical efficacy randomized controlled trials (RCTs) with high internal validity. Real-world research is made necessary by the variety of factors that can play an important a role in modulating effectiveness in real life but are often tightly controlled in RCTs, such as comorbidities and concomitant treatments, adherence, inhalation technique, access to care, strength of doctor-caregiver communication, and socio-economic and other organizational factors. Real-world studies belong to two main categories: pragmatic trials and observational studies, which can be prospective or retrospective. Focusing on comparative database observational studies, the process aimed at ensuring high-quality research can be divided into three parts: preparation of research, analyses and reporting, and discussion of results. Key points include a priori planning of data collection and analyses, identification of appropriate database(s), proper outcomes definition, study registration with commitment to publish, bias minimization through matching and adjustment processes accounting for potential confounders, and sensitivity analyses testing the robustness of results. When these conditions are met, observational database studies can reach a sufficient level of evidence to help create guidelines (i.e., clinical and regulatory decision-making).

  2. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    PubMed

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  3. microPIR2: a comprehensive database for human–mouse comparative study of microRNA–promoter interactions

    PubMed Central

    Piriyapongsa, Jittima; Bootchai, Chaiwat; Ngamphiw, Chumpol; Tongsima, Sissades

    2014-01-01

    microRNA (miRNA)–promoter interaction resource (microPIR) is a public database containing over 15 million predicted miRNA target sites located within human promoter sequences. These predicted targets are presented along with their related genomic and experimental data, making the microPIR database the most comprehensive repository of miRNA promoter target sites. Here, we describe major updates of the microPIR database including new target predictions in the mouse genome and revised human target predictions. The updated database (microPIR2) now provides ∼80 million human and 40 million mouse predicted target sites. In addition to being a reference database, microPIR2 is a tool for comparative analysis of target sites on the promoters of human–mouse orthologous genes. In particular, this new feature was designed to identify potential miRNA–promoter interactions conserved between species that could be stronger candidates for further experimental validation. We also incorporated additional supporting information to microPIR2 such as nuclear and cytoplasmic localization of miRNAs and miRNA–disease association. Extra search features were also implemented to enable various investigations of targets of interest. Database URL: http://www4a.biotec.or.th/micropir2 PMID:25425035

  4. MIPSPlantsDB—plant database resource for integrative and comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Noubibou, Octave; Haase, Dirk; Yang, Li; Gundlach, Heidrun; Hindemitt, Tobias; Klee, Kathrin; Haberer, Georg; Schoof, Heiko; Mayer, Klaus F. X.

    2007-01-01

    Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at . PMID:17202173

  5. EuCliD (European Clinical Database): a database comparing different realities.

    PubMed

    Marcelli, D; Kirchgessner, J; Amato, C; Steil, H; Mitteregger, A; Moscardò, V; Carioni, C; Orlandini, G; Gatti, E

    2001-01-01

    Quality and variability of dialysis practice are generally gaining more and more importance. Fresenius Medical Care (FMC), as provider of dialysis, has the duty to continuously monitor and guarantee the quality of care delivered to patients treated in its European dialysis units. Accordingly, a new clinical database called EuCliD has been developed. It is a multilingual and fully codified database, using as far as possible international standard coding tables. EuCliD collects and handles sensitive medical patient data, fully assuring confidentiality. The Infrastructure: a Domino server is installed in each country connected to EuCliD. All the centres belonging to a country are connected via modem to the country server. All the Domino Servers are connected via Wide Area Network to the Head Quarter Server in Bad Homburg (Germany). Inside each country server only anonymous data related to that particular country are available. The only place where all the anonymous data are available is the Head Quarter Server. The data collection is strongly supported in each country by "key-persons" with solid relationships to their respective national dialysis units. The quality of the data in EuCliD is ensured at different levels. At the end of January 2001, more than 11,000 patients treated in 135 centres located in 7 countries are already included in the system. FMC has put the patient care at the centre of its activities for many years and now is able to provide transparency to the community (Authorities, Nephrologists, Patients.....) thus demonstrating the quality of the service.

  6. Tissue Molecular Anatomy Project (TMAP): an expression database for comparative cancer proteomics.

    PubMed

    Medjahed, Djamel; Luke, Brian T; Tontesh, Tawady S; Smythers, Gary W; Munroe, David J; Lemkin, Peter F

    2003-08-01

    By mining publicly accessible databases, we have developed a collection of tissue-specific predictive protein expression maps as a function of cancer histological state. Data analysis is applied to the differential expression of gene products in pooled libraries from the normal to the altered state(s). We wish to report the initial results of our survey across different tissues and explore the extent to which this comparative approach may help uncover panels of potential biomarkers of tumorigenesis which would warrant further examination in the laboratory.

  7. Natural Variation in Fish Transcriptomes: Comparative Analysis of the Fathead Minnow (Pimephales promelas) and Zebrafish (Danio rerio)

    EPA Science Inventory

    Fathead minnow and zebrafish are among the most intensively studied fish species in environmental toxicogenomics. To aid the assessment and interpretation of subtle transcriptomic effects from treatment conditions of interest, there needs to be a better characterization and unde...

  8. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases.

    PubMed

    Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B

    2018-06-03

    There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.

  9. Four Current Awareness Databases: Coverage and Currency Compared.

    ERIC Educational Resources Information Center

    Jaguszewski, Janice M.; Kempf, Jody L.

    1995-01-01

    Discusses the usability and content of the following table of contents (TOC) databases selected by science and engineering librarians at the University of Minnesota Twin Cities: Current Contents on Diskette (CCoD), CARL Uncover2, Inside Information, and Contents1st. (AEF)

  10. Comparing data mining methods on the VAERS database.

    PubMed

    Banks, David; Woo, Emily Jane; Burwen, Dale R; Perucci, Phil; Braun, M Miles; Ball, Robert

    2005-09-01

    Data mining may enhance traditional surveillance of vaccine adverse events by identifying events that are reported more commonly after administering one vaccine than other vaccines. Data mining methods find signals as the proportion of times a condition or group of conditions is reported soon after the administration of a vaccine; thus it is a relative proportion compared across vaccines, and not an absolute rate for the condition. The Vaccine Adverse Event Reporting System (VAERS) contains approximately 150 000 reports of adverse events that are possibly associated with vaccine administration. We studied four data mining techniques: empirical Bayes geometric mean (EBGM), lower-bound of the EBGM's 90% confidence interval (EB05), proportional reporting ratio (PRR), and screened PRR (SPRR). We applied these to the VAERS database and compared the agreement among methods and other performance properties, particularly focusing on the vaccine-event combinations with the highest numerical scores in the various methods. The vaccine-event combinations with the highest numerical scores varied substantially among the methods. Not all combinations representing known associations appeared in the top 100 vaccine-event pairs for all methods. The four methods differ in their ranking of vaccine-COSTART pairs. A given method may be superior in certain situations but inferior in others. This paper examines the statistical relationships among the four estimators. Determining which method is best for public health will require additional analysis that focuses on the true alarm and false alarm rates using known vaccine-event associations. Evaluating the properties of these data mining methods will help determine the value of such methods in vaccine safety surveillance. (c) 2005 John Wiley & Sons, Ltd.

  11. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    NASA Astrophysics Data System (ADS)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  12. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; Roberts, Phoebe M; King, Benjamin L; Lay, Jean M; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J; Enayetallah, Ahmed E; Mattingly, Carolyn J

    2013-01-01

    Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/

  13. Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

    NASA Astrophysics Data System (ADS)

    Dykstra, Dave

    2012-12-01

    One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kienhuis, Anne S., E-mail: anne.kienhuis@rivm.nl; RIKILT, Institute of Food Safety, Wageningen UR, PO Box 230, 6700 AE, Wageningen; Netherlands Toxicogenomics Centre

    Hepatic systems toxicology is the integrative analysis of toxicogenomic technologies, e.g., transcriptomics, proteomics, and metabolomics, in combination with traditional toxicology measures to improve the understanding of mechanisms of hepatotoxic action. Hepatic toxicology studies that have employed toxicogenomic technologies to date have already provided a proof of principle for the value of hepatic systems toxicology in hazard identification. In the present review, acetaminophen is used as a model compound to discuss the application of toxicogenomics in hepatic systems toxicology for its potential role in the risk assessment process, to progress from hazard identification towards hazard characterization. The toxicogenomics-based parallelogram is usedmore » to identify current achievements and limitations of acetaminophen toxicogenomic in vivo and in vitro studies for in vitro-to-in vivo and interspecies comparisons, with the ultimate aim to extrapolate animal studies to humans in vivo. This article provides a model for comparison of more species and more in vitro models enhancing the robustness of common toxicogenomic responses and their relevance to human risk assessment. To progress to quantitative dose-response analysis needed for hazard characterization, in hepatic systems toxicology studies, generation of toxicogenomic data of multiple doses/concentrations and time points is required. Newly developed bioinformatics tools for quantitative analysis of toxicogenomic data can aid in the elucidation of dose-responsive effects. The challenge herein is to assess which toxicogenomic responses are relevant for induction of the apical effect and whether perturbations are sufficient for the induction of downstream events, eventually causing toxicity.« less

  15. Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dykstra, Dave

    One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less

  16. The intelligent database machine

    NASA Technical Reports Server (NTRS)

    Yancey, K. E.

    1985-01-01

    The IDM data base was compared with the data base crack to determine whether IDM 500 would better serve the needs of the MSFC data base management system than Oracle. The two were compared and the performance of the IDM was studied. Implementations that work best on which database are implicated. The choice is left to the database administrator.

  17. MoccaDB - an integrative database for functional, comparative and diversity studies in the Rubiaceae family

    PubMed Central

    Plechakova, Olga; Tranchant-Dubreuil, Christine; Benedet, Fabrice; Couderc, Marie; Tinaut, Alexandra; Viader, Véronique; De Block, Petra; Hamon, Perla; Campa, Claudine; de Kochko, Alexandre; Hamon, Serge; Poncet, Valérie

    2009-01-01

    Background In the past few years, functional genomics information has been rapidly accumulating on Rubiaceae species and especially on those belonging to the Coffea genus (coffee trees). An increasing number of expressed sequence tag (EST) data and EST- or genomic-derived microsatellite markers have been generated, together with Conserved Ortholog Set (COS) markers. This considerably facilitates comparative genomics or map-based genetic studies through the common use of orthologous loci across different species. Similar genomic information is available for e.g. tomato or potato, members of the Solanaceae family. Since both Rubiaceae and Solanaceae belong to the Euasterids I (lamiids) integration of information on genetic markers would be possible and lead to more efficient analyses and discovery of key loci involved in important traits such as fruit development, quality, and maturation, or adaptation. Our goal was to develop a comprehensive web data source for integrated information on validated orthologous markers in Rubiaceae. Description MoccaDB is an online MySQL-PHP driven relational database that houses annotated and/or mapped microsatellite markers in Rubiaceae. In its current release, the database stores 638 markers that have been defined on 259 ESTs and 379 genomic sequences. Marker information was retrieved from 11 published works, and completed with original data on 132 microsatellite markers validated in our laboratory. DNA sequences were derived from three Coffea species/hybrids. Microsatellite markers were checked for similarity, in vitro tested for cross-amplification and diversity/polymorphism status in up to 38 Rubiaceae species belonging to the Cinchonoideae and Rubioideae subfamilies. Functional annotation was provided and some markers associated with described metabolic pathways were also integrated. Users can search the database for marker, sequence, map or diversity information through multi-option query forms. The retrieved data can be

  18. Transcriptional Responses Reveal Similarities Between Preclinical Rat Liver Testing Systems.

    PubMed

    Liu, Zhichao; Delavan, Brian; Roberts, Ruth; Tong, Weida

    2018-01-01

    Toxicogenomics (TGx) is an important tool to gain an enhanced understanding of toxicity at the molecular level. Previously, we developed a pair ranking (PRank) method to assess in vitro to in vivo extrapolation (IVIVE) using toxicogenomic datasets from the Open Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System (TG-GATEs) database. With this method, we investiagted three important questions that were not addressed in our previous study: (1) is a 1-day in vivo short-term assay able to replace the 28-day standard and expensive toxicological assay? (2) are some biological processes more conservative across different preclinical testing systems than others? and (3) do these preclinical testing systems have the similar resolution in differentiating drugs by their therapeutic uses? For question 1, a high similarity was noted (PRank score = 0.90), indicating the potential utility of shorter term in vivo studies to predict outcome in longer term and more expensive in vivo model systems. There was a moderate similarity between rat primary hepatocytes and in vivo repeat-dose studies (PRank score = 0.71) but a low similarity (PRank score = 0.56) between rat primary hepatocytes and in vivo single dose studies. To address question 2, we limited the analysis to gene sets relevant to specific toxicogenomic pathways and we found that pathways such as lipid metabolism were consistently over-represented in all three assay systems. For question 3, all three preclinical assay systems could distinguish compounds from different therapeutic categories. This suggests that any noted differences in assay systems was biological process-dependent and furthermore that all three systems have utility in assessing drug responses within a certain drug class. In conclusion, this comparison of three commonly used rat TGx systems provides useful information in utility and application of TGx assays.

  19. Genomic Models of Short-Term Exposure Accurately Predict Long-Term Chemical Carcinogenicity and Identify Putative Mechanisms of Action

    PubMed Central

    Gusenleitner, Daniel; Auerbach, Scott S.; Melia, Tisha; Gómez, Harold F.; Sherr, David H.; Monti, Stefano

    2014-01-01

    Background Despite an overall decrease in incidence of and mortality from cancer, about 40% of Americans will be diagnosed with the disease in their lifetime, and around 20% will die of it. Current approaches to test carcinogenic chemicals adopt the 2-year rodent bioassay, which is costly and time-consuming. As a result, fewer than 2% of the chemicals on the market have actually been tested. However, evidence accumulated to date suggests that gene expression profiles from model organisms exposed to chemical compounds reflect underlying mechanisms of action, and that these toxicogenomic models could be used in the prediction of chemical carcinogenicity. Results In this study, we used a rat-based microarray dataset from the NTP DrugMatrix Database to test the ability of toxicogenomics to model carcinogenicity. We analyzed 1,221 gene-expression profiles obtained from rats treated with 127 well-characterized compounds, including genotoxic and non-genotoxic carcinogens. We built a classifier that predicts a chemical's carcinogenic potential with an AUC of 0.78, and validated it on an independent dataset from the Japanese Toxicogenomics Project consisting of 2,065 profiles from 72 compounds. Finally, we identified differentially expressed genes associated with chemical carcinogenesis, and developed novel data-driven approaches for the molecular characterization of the response to chemical stressors. Conclusion Here, we validate a toxicogenomic approach to predict carcinogenicity and provide strong evidence that, with a larger set of compounds, we should be able to improve the sensitivity and specificity of the predictions. We found that the prediction of carcinogenicity is tissue-dependent and that the results also confirm and expand upon previous studies implicating DNA damage, the peroxisome proliferator-activated receptor, the aryl hydrocarbon receptor, and regenerative pathology in the response to carcinogen exposure. PMID:25058030

  20. Scopus database: a review.

    PubMed

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  1. Scopus database: a review

    PubMed Central

    Burnham, Judy F

    2006-01-01

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs. PMID:16522216

  2. "Mr. Database" : Jim Gray and the History of Database Technologies.

    PubMed

    Hanwahr, Nils C

    2017-12-01

    Although the widespread use of the term "Big Data" is comparatively recent, it invokes a phenomenon in the developments of database technology with distinct historical contexts. The database engineer Jim Gray, known as "Mr. Database" in Silicon Valley before his disappearance at sea in 2007, was involved in many of the crucial developments since the 1970s that constitute the foundation of exceedingly large and distributed databases. Jim Gray was involved in the development of relational database systems based on the concepts of Edgar F. Codd at IBM in the 1970s before he went on to develop principles of Transaction Processing that enable the parallel and highly distributed performance of databases today. He was also involved in creating forums for discourse between academia and industry, which influenced industry performance standards as well as database research agendas. As a co-founder of the San Francisco branch of Microsoft Research, Gray increasingly turned toward scientific applications of database technologies, e. g. leading the TerraServer project, an online database of satellite images. Inspired by Vannevar Bush's idea of the memex, Gray laid out his vision of a Personal Memex as well as a World Memex, eventually postulating a new era of data-based scientific discovery termed "Fourth Paradigm Science". This article gives an overview of Gray's contributions to the development of database technology as well as his research agendas and shows that central notions of Big Data have been occupying database engineers for much longer than the actual term has been in use.

  3. PGSB PlantsDB: updates to the database framework for comparative plant genome research.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X

    2016-01-04

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    PubMed Central

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a

  5. A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Roberts, Phoebe M.; King, Benjamin L.; Lay, Jean M.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J.; Enayetallah, Ahmed E.; Mattingly, Carolyn J.

    2013-01-01

    Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/ PMID:24288140

  6. Human cell toxicogenomic analysis links reactive oxygen species to the toxicity of monohaloacetic acid drinking water disinfection byproducts

    PubMed Central

    Pals, Justin; Attene-Ramos, Matias S.; Xia, Menghang; Wagner, Elizabeth D.; Plewa, Michael J.

    2014-01-01

    Chronic exposure to drinking water disinfection byproducts has been linked to adverse health risks. The monohaloacetic acids (monoHAAs) are generated as byproducts during the disinfection of drinking water and are cytotoxic, genotoxic, mutagenic, and teratogenic. Iodoacetic acid toxicity was mitigated by antioxidants, suggesting the involvement of oxidative stress. Other monoHAAs may share a similar mode of action. Each monoHAA generated a significant concentration-response increase in the expression of a β-lactamase reporter under the control of the Antioxidant Response Element (ARE). The monoHAAs generated oxidative stress with a rank order of IAA > BAA >> CAA; this rank order was observed with other toxicological endpoints. Toxicogenomic analysis was conducted with a non-transformed human intestinal epithelial cell line (FHs 74 Int). Exposure to the monoHAAs altered the transcription levels of multiple oxidative stress responsive genes, indicating that each exposure generated oxidative stress. The transcriptome profiles showed an increase in TXNRD1 and SRXN1, suggesting peroxiredoxin proteins had been oxidized during monoHAA exposures. Three sources of reactive oxygen species were identified, the hypohalous acid generating peroxidase enzymes LPO and MPO, NADPH-dependent oxidase NOX5, and PTGS2 (COX-2) mediated arachidonic acid metabolism. Each monoHAA exposure caused an increase in COX-2 mRNA levels. These data provide a functional association between monoHAA exposure and adverse health outcomes such as oxidative stress, inflammation, and cancer. PMID:24050308

  7. Transparency, usability, and reproducibility: Guiding principles for improving comparative databases using primates as examples.

    PubMed

    Borries, Carola; Sandel, Aaron A; Koenig, Andreas; Fernandez-Duque, Eduardo; Kamilar, Jason M; Amoroso, Caroline R; Barton, Robert A; Bray, Joel; Di Fiore, Anthony; Gilby, Ian C; Gordon, Adam D; Mundry, Roger; Port, Markus; Powell, Lauren E; Pusey, Anne E; Spriggs, Amanda; Nunn, Charles L

    2016-09-01

    Recent decades have seen rapid development of new analytical methods to investigate patterns of interspecific variation. Yet these cutting-edge statistical analyses often rely on data of questionable origin, varying accuracy, and weak comparability, which seem to have reduced the reproducibility of studies. It is time to improve the transparency of comparative data while also making these improved data more widely available. We, the authors, met to discuss how transparency, usability, and reproducibility of comparative data can best be achieved. We propose four guiding principles: 1) data identification with explicit operational definitions and complete descriptions of methods; 2) inclusion of metadata that capture key characteristics of the data, such as sample size, geographic coordinates, and nutrient availability (for example, captive versus wild animals); 3) documentation of the original reference for each datum; and 4) facilitation of effective interactions with the data via user friendly and transparent interfaces. We urge reviewers, editors, publishers, database developers and users, funding agencies, researchers publishing their primary data, and those performing comparative analyses to embrace these standards to increase the transparency, usability, and reproducibility of comparative studies. © 2016 Wiley Periodicals, Inc.

  8. Comparative study on the customization of natural language interfaces to databases.

    PubMed

    Pazos R, Rodolfo A; Aguirre L, Marco A; González B, Juan J; Martínez F, José A; Pérez O, Joaquín; Verástegui O, Andrés A

    2016-01-01

    In the last decades the popularity of natural language interfaces to databases (NLIDBs) has increased, because in many cases information obtained from them is used for making important business decisions. Unfortunately, the complexity of their customization by database administrators make them difficult to use. In order for a NLIDB to obtain a high percentage of correctly translated queries, it is necessary that it is correctly customized for the database to be queried. In most cases the performance reported in NLIDB literature is the highest possible; i.e., the performance obtained when the interfaces were customized by the implementers. However, for end users it is more important the performance that the interface can yield when the NLIDB is customized by someone different from the implementers. Unfortunately, there exist very few articles that report NLIDB performance when the NLIDBs are not customized by the implementers. This article presents a semantically-enriched data dictionary (which permits solving many of the problems that occur when translating from natural language to SQL) and an experiment in which two groups of undergraduate students customized our NLIDB and English language frontend (ELF), considered one of the best available commercial NLIDBs. The experimental results show that, when customized by the first group, our NLIDB obtained a 44.69 % of correctly answered queries and ELF 11.83 % for the ATIS database, and when customized by the second group, our NLIDB attained 77.05 % and ELF 13.48 %. The performance attained by our NLIDB, when customized by ourselves was 90 %.

  9. Clinical Prediction Models for Cardiovascular Disease: Tufts Predictive Analytics and Comparative Effectiveness Clinical Prediction Model Database.

    PubMed

    Wessler, Benjamin S; Lai Yh, Lana; Kramer, Whitney; Cangelosi, Michael; Raman, Gowri; Lutz, Jennifer S; Kent, David M

    2015-07-01

    Clinical prediction models (CPMs) estimate the probability of clinical outcomes and hold the potential to improve decision making and individualize care. For patients with cardiovascular disease, there are numerous CPMs available although the extent of this literature is not well described. We conducted a systematic review for articles containing CPMs for cardiovascular disease published between January 1990 and May 2012. Cardiovascular disease includes coronary heart disease, heart failure, arrhythmias, stroke, venous thromboembolism, and peripheral vascular disease. We created a novel database and characterized CPMs based on the stage of development, population under study, performance, covariates, and predicted outcomes. There are 796 models included in this database. The number of CPMs published each year is increasing steadily over time. Seven hundred seventeen (90%) are de novo CPMs, 21 (3%) are CPM recalibrations, and 58 (7%) are CPM adaptations. This database contains CPMs for 31 index conditions, including 215 CPMs for patients with coronary artery disease, 168 CPMs for population samples, and 79 models for patients with heart failure. There are 77 distinct index/outcome pairings. Of the de novo models in this database, 450 (63%) report a c-statistic and 259 (36%) report some information on calibration. There is an abundance of CPMs available for a wide assortment of cardiovascular disease conditions, with substantial redundancy in the literature. The comparative performance of these models, the consistency of effects and risk estimates across models and the actual and potential clinical impact of this body of literature is poorly understood. © 2015 American Heart Association, Inc.

  10. Comparison of the NCI open database with seven large chemical structural databases.

    PubMed

    Voigt, J H; Bienfait, B; Wang, S; Nicklaus, M C

    2001-01-01

    Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.

  11. Signal transduction disturbance related to hepatocarcinogenesis in mouse by prolonged exposure to Nanjing drinking water.

    PubMed

    Zhang, Rui; Sun, Jie; Zhang, Yan; Cheng, Shupei; Zhang, Xiaowei

    2013-09-01

    Toxicogenomic approaches were used to investigate the potential hepatocarcinogenic effects on mice by oral exposure to Nanjing drinking water (NJDW). Changes in the hepatic transcriptome of 3 weeks male mice (Mus musculus) were monitored and dissected after oral exposure to NJDW for 90 days. No preneoplastic and neoplastic lesions were observed in the hepatic tissue by the end of NJDW exposure. However, total of 746 genes were changed transcriptionally. Thirty-one percent of differentially expressed genes (DEGs) were associated with the functional categories of cell cycle regulation, adhesion, growth, apoptosis, and signal transduction, which are closely implicated in tumorigenesis and progression. Interrogation of Kyoto Encyclopedia of Genes and Genomes revealed that 43 DEGs were mapped to several crucial signaling pathways implicated in the pathogenesis of hepatocellular carcinoma (HCC). In signal transduction network constructed via Genes2Networks software, Egfr, Akt1, Atf2, Ctnnb1, Hras, Mapk1, Smad2, and Ccnd1 were hubs. Direct gene-disease relationships obtained from Comparative Toxicogenomics Database and scientific literatures revealed that the hubs have direct mechanism or biomarker relationships with hepatocellular preneoplastic lesions or hepatocarcinogenesis. Therefore, prolonged intake of NJDW without employing any indoor water treatment strategy might predispose mouse to HCC. Furthermore, Egfr, Akt1, Ctnnb1, Hras, Mapk1, Smad2, and Ccnd1 were identified as promising biomarkers of the potential combined hepatocarcinogenicity.

  12. Comparing features sets for content-based image retrieval in a medical-case database

    NASA Astrophysics Data System (ADS)

    Muller, Henning; Rosset, Antoine; Vallee, Jean-Paul; Geissbuhler, Antoine

    2004-04-01

    Content-based image retrieval systems (CBIRSs) have frequently been proposed for the use in medical image databases and PACS. Still, only few systems were developed and used in a real clinical environment. It rather seems that medical professionals define their needs and computer scientists develop systems based on data sets they receive with little or no interaction between the two groups. A first study on the diagnostic use of medical image retrieval also shows an improvement in diagnostics when using CBIRSs which underlines the potential importance of this technique. This article explains the use of an open source image retrieval system (GIFT - GNU Image Finding Tool) for the retrieval of medical images in the medical case database system CasImage that is used in daily, clinical routine in the university hospitals of Geneva. Although the base system of GIFT shows an unsatisfactory performance, already little changes in the feature space show to significantly improve the retrieval results. The performance of variations in feature space with respect to color (gray level) quantizations and changes in texture analysis (Gabor filters) is compared. Whereas stock photography relies mainly on colors for retrieval, medical images need a large number of gray levels for successful retrieval, especially when executing feedback queries. The results also show that a too fine granularity in the gray levels lowers the retrieval quality, especially with single-image queries. For the evaluation of the retrieval peformance, a subset of the entire case database of more than 40,000 images is taken with a total of 3752 images. Ground truth was generated by a user who defined the expected query result of a perfect system by selecting images relevant to a given query image. The results show that a smaller number of gray levels (32 - 64) leads to a better retrieval performance, especially when using relevance feedback. The use of more scales and directions for the Gabor filters in the

  13. Landslide databases to compare regional repair and mitigation strategies of transportation infrastructure

    NASA Astrophysics Data System (ADS)

    Wohlers, Annika; Damm, Bodo

    2017-04-01

    Regional data of the Central German Uplands are extracted from the German landslide database in order to understand the complex interactions between landslide risks and public risk awareness considering transportation infrastructure. Most information within the database is gathered by means of archive studies from inventories of emergency agencies, state, press and web archives, company and department records as well as scientific and (geo)technical literature. The information includes land use practices, repair and mitigation measures with resultant costs of the German road network as well as railroad and waterway networks. It therefore contains valuable information of historical and current landslide impacts, elements at risk and provides an overview of spatiotemporal changes in social exposure and vulnerability to landslide hazards over the last 120 years. On a regional scale the recorded infrastructure damages, and consequential repair or mitigation measures were categorized and classified, according to relevant landslide types, processes and types of infrastructure. In a further step, the data of recent landslides are compared with historical and modern repair and mitigation measures and are correlated with socioeconomic concepts. As a result, it is possible to identify some complex interactions between landslide hazard, risk perception, and damage impact, including time lags and intensity thresholds. The data reveal distinct concepts of repairing respectively mitigating landslides on different types of transportation infrastructure, which are not exclusively linked to higher construction efforts (e.g. embankments on railroads and channels), but changing levels of economic losses and risk perception as well. In addition, a shift from low cost prevention measures such as the removal of loose rock and vegetation, rock blasting, and catch barriers towards expensive mitigation measures such as catch fences, soil anchoring and rock nailing over time can be noticed

  14. ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

    PubMed Central

    Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

    2009-01-01

    We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624

  15. Human cell toxicogenomic analysis linking reactive oxygen species to the toxicity of monohaloacetic acid drinking water disinfection byproducts.

    PubMed

    Pals, Justin; Attene-Ramos, Matias S; Xia, Menghang; Wagner, Elizabeth D; Plewa, Michael J

    2013-01-01

    Chronic exposure to drinking water disinfection byproducts has been linked to adverse health risks. The monohaloacetic acids (monoHAAs) are generated as byproducts during the disinfection of drinking water and are cytotoxic, genotoxic, mutagenic, and teratogenic. Iodoacetic acid toxicity was mitigated by antioxidants, suggesting the involvement of oxidative stress. Other monoHAAs may share a similar mode of action. Each monoHAA generated a significant concentration-response increase in the expression of a β-lactamase reporter under the control of the antioxidant response element (ARE). The monoHAAs generated oxidative stress with a rank order of iodoacetic acid (IAA) > bromoacetic acid (BAA) ≫ chloroacetic acid (CAA); this rank order was observed with other toxicological end points. Toxicogenomic analysis was conducted with a nontransformed human intestinal epithelial cell line (FHs 74 Int). Exposure to the monoHAAs altered the transcription levels of multiple oxidative stress responsive genes, indicating that each exposure generated oxidative stress. The transcriptome profiles showed an increase in thioredoxin reductase 1 (TXNRD1) and sulfiredoxin (SRXN1), suggesting peroxiredoxin proteins had been oxidized during monoHAA exposures. Three possible sources of reactive oxygen species were identified, the hypohalous acid generating peroxidase enzymes lactoperoxidase (LPO) and myeloperoxidase (MPO), nicotinamide adenine dinucleotide phosphate (NADPH)-dependent oxidase 5 (NOX5), and PTGS2 (COX-2) mediated arachidonic acid metabolism. Each monoHAA exposure caused an increase in COX-2 mRNA levels. These data provide a functional association between monoHAA exposure and adverse health outcomes such as oxidative stress, inflammation, and cancer.

  16. Comparative Toxicogenomic Responses to the Flame Retardant mITP in Developing Zebrafish.

    PubMed

    Haggard, Derik E; Das, Siba R; Tanguay, Robert L

    2017-02-20

    Monosubstituted isopropylated triaryl phosphate (mITP) is a major component of Firemaster 550, an additive flame retardant mixture commonly used in polyurethane foams. Developmental toxicity studies in zebrafish established mITP as the most toxic component of FM 550, which causes pericardial edema and heart looping failure. Mechanistic studies showed that mITP is an aryl hydrocarbon receptor (AhR) ligand; however, the cardiotoxic effects of mITP were independent of the AhR. We performed comparative whole genome transcriptomics in wild-type and ahr2 hu3335 zebrafish, which lack functional ahr2, to identify transcriptional signatures causally involved in the mechanism of mITP-induced cardiotoxicity. Regardless of ahr2 status, mITP exposure resulted in decreased expression of transcripts related to the synthesis of all-trans-retinoic acid and a host of Hox genes. Clustered gene ontology enrichment analysis showed unique enrichment in biological processes related to xenobiotic metabolism and response to external stimuli in wild-type samples. Transcript enrichments overlapping both genotypes involved the retinoid metabolic process and sensory/visual perception biological processes. Examination of the gene-gene interaction network of the differentially expressed transcripts in both genetic backgrounds demonstrated a strong AhR interaction network specific to wild-type samples, with overlapping genes regulated by retinoic acid receptors (RARs). A transcriptome analysis of control ahr2-null zebrafish identified potential cross-talk among AhR, Nrf2, and Hif1α. Collectively, we confirmed that mITP is an AhR ligand and present evidence in support of our hypothesis that mITP's developmental cardiotoxic effects are mediated by inhibition at the RAR level.

  17. Description of two waterborne disease outbreaks in France: a comparative study with data from cohort studies and from health administrative databases.

    PubMed

    Mouly, D; Van Cauteren, D; Vincent, N; Vaissiere, E; Beaudeau, P; Ducrot, C; Gallay, A

    2016-02-01

    Waterborne disease outbreaks (WBDO) of acute gastrointestinal illness (AGI) are a public health concern in France. Their occurrence is probably underestimated due to the lack of a specific surveillance system. The French health insurance database provides an interesting opportunity to improve the detection of these events. A specific algorithm to identify AGI cases from drug payment reimbursement data in the health insurance database has been previously developed. The purpose of our comparative study was to retrospectively assess the ability of the health insurance data to describe WBDO. Data from the health insurance database was compared with the data from cohort studies conducted in two WBDO in 2010 and 2012. The temporal distribution of cases, the day of the peak and the duration of the epidemic, as measured using the health insurance data, were similar to the data from one of the two cohort studies. However, health insurance data accounted for 54 cases compared to the estimated 252 cases accounted for in the cohort study. The accuracy of using health insurance data to describe WBDO depends on the medical consultation rate in the impacted population. As this is never the case, data analysis underestimates the total number of AGI cases. However this data source can be considered for the development of a detection system of a WBDO in France, given its ability to describe an epidemic signal.

  18. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices.

    PubMed

    Dececchi, T Alex; Mabee, Paula M; Blackburn, David C

    2016-01-01

    Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.

  19. A development and integration of database code-system with a compilation of comparator, k0 and absolute methods for INAA using microsoft access

    NASA Astrophysics Data System (ADS)

    Hoh, Siew Sin; Rapie, Nurul Nadiah; Lim, Edwin Suh Wen; Tan, Chun Yuan; Yavar, Alireza; Sarmani, Sukiman; Majid, Amran Ab.; Khoo, Kok Siong

    2013-05-01

    Instrumental Neutron Activation Analysis (INAA) is often used to determine and calculate the elemental concentrations of a sample at The National University of Malaysia (UKM) typically in Nuclear Science Programme, Faculty of Science and Technology. The objective of this study was to develop a database code-system based on Microsoft Access 2010 which could help the INAA users to choose either comparator method, k0-method or absolute method for calculating the elemental concentrations of a sample. This study also integrated k0data, Com-INAA, k0Concent, k0-Westcott and Abs-INAA to execute and complete the ECC-UKM database code-system. After the integration, a study was conducted to test the effectiveness of the ECC-UKM database code-system by comparing the concentrations between the experiments and the code-systems. 'Triple Bare Monitor' Zr-Au and Cr-Mo-Au were used in k0Concent, k0-Westcott and Abs-INAA code-systems as monitors to determine the thermal to epithermal neutron flux ratio (f). Calculations involved in determining the concentration were net peak area (Np), measurement time (tm), irradiation time (tirr), k-factor (k), thermal to epithermal neutron flux ratio (f), parameters of the neutron flux distribution epithermal (α) and detection efficiency (ɛp). For Com-INAA code-system, certified reference material IAEA-375 Soil was used to calculate the concentrations of elements in a sample. Other CRM and SRM were also used in this database codesystem. Later, a verification process to examine the effectiveness of the Abs-INAA code-system was carried out by comparing the sample concentrations between the code-system and the experiment. The results of the experimental concentration values of ECC-UKM database code-system were performed with good accuracy.

  20. Tibetan Magmatism Database

    NASA Astrophysics Data System (ADS)

    Chapman, James B.; Kapp, Paul

    2017-11-01

    A database containing previously published geochronologic, geochemical, and isotopic data on Mesozoic to Quaternary igneous rocks in the Himalayan-Tibetan orogenic system are presented. The database is intended to serve as a repository for new and existing igneous rock data and is publicly accessible through a web-based platform that includes an interactive map and data table interface with search, filtering, and download options. To illustrate the utility of the database, the age, location, and ɛHft composition of magmatism from the central Gangdese batholith in the southern Lhasa terrane are compared. The data identify three high-flux events, which peak at 93, 50, and 15 Ma. They are characterized by inboard arc migration and a temporal and spatial shift to more evolved isotopic compositions.

  1. Comparative analysis of hierarchical triangulated irregular networks to represent 3D elevation in terrain databases

    NASA Astrophysics Data System (ADS)

    Abdelguerfi, Mahdi; Wynne, Chris; Cooper, Edgar; Ladner, Roy V.; Shaw, Kevin B.

    1997-08-01

    Three-dimensional terrain representation plays an important role in a number of terrain database applications. Hierarchical triangulated irregular networks (TINs) provide a variable-resolution terrain representation that is based on a nested triangulation of the terrain. This paper compares and analyzes existing hierarchical triangulation techniques. The comparative analysis takes into account how aesthetically appealing and accurate the resulting terrain representation is. Parameters, such as adjacency, slivers, and streaks, are used to provide a measure on how aesthetically appealing the terrain representation is. Slivers occur when the triangulation produces thin and slivery triangles. Streaks appear when there are too many triangulations done at a given vertex. Simple mathematical expressions are derived for these parameters, thereby providing a fairer and a more easily duplicated comparison. In addition to meeting the adjacency requirement, an aesthetically pleasant hierarchical TINs generation algorithm is expected to reduce both slivers and streaks while maintaining accuracy. A comparative analysis of a number of existing approaches shows that a variant of a method originally proposed by Scarlatos exhibits better overall performance.

  2. Connection Map for Compounds (CMC): A Server for Combinatorial Drug Toxicity and Efficacy Analysis.

    PubMed

    Liu, Lei; Tsompana, Maria; Wang, Yong; Wu, Dingfeng; Zhu, Lixin; Zhu, Ruixin

    2016-09-26

    Drug discovery and development is a costly and time-consuming process with a high risk for failure resulting primarily from a drug's associated clinical safety and efficacy potential. Identifying and eliminating inapt candidate drugs as early as possible is an effective way for reducing unnecessary costs, but limited analytical tools are currently available for this purpose. Recent growth in the area of toxicogenomics and pharmacogenomics has provided with a vast amount of drug expression microarray data. Web servers such as CMap and LTMap have used this information to evaluate drug toxicity and mechanisms of action independently; however, their wider applicability has been limited by the lack of a combinatorial drug-safety type of analysis. Using available genome-wide drug transcriptional expression profiles, we developed the first web server for combinatorial evaluation of toxicity and efficacy of candidate drugs named "Connection Map for Compounds" (CMC). Using CMC, researchers can initially compare their query drug gene signatures with prebuilt gene profiles generated from two large-scale toxicogenomics databases, and subsequently perform a drug efficacy analysis for identification of known mechanisms of drug action or generation of new predictions. CMC provides a novel approach for drug repositioning and early evaluation in drug discovery with its unique combination of toxicity and efficacy analyses, expansibility of data and algorithms, and customization of reference gene profiles. CMC can be freely accessed at http://cadd.tongji.edu.cn/webserver/CMCbp.jsp .

  3. Enhancing navigation in biomedical databases by community voting and database-driven text classification

    PubMed Central

    Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

    2009-01-01

    Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled

  4. CycADS: an annotation database system to ease the development and update of BioCyc databases

    PubMed Central

    Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano

    2011-01-01

    In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http

  5. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices

    PubMed Central

    Dececchi, T. Alex; Mabee, Paula M.; Blackburn, David C.

    2016-01-01

    Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications (‘monographs’) and those used in phylogenetic analyses (‘matrices’). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life. PMID:27191170

  6. Atomic Spectroscopic Databases at NIST

    NASA Technical Reports Server (NTRS)

    Reader, J.; Kramida, A. E.; Ralchenko, Yu.

    2006-01-01

    We describe recent work at NIST to develop and maintain databases for spectra, transition probabilities, and energy levels of atoms that are astrophysically important. Our programs to critically compile these data as well as to develop a new database to compare plasma calculations for atoms that are not in local thermodynamic equilibrium are also summarized.

  7. Comparative toxicogenomic analysis of oral Cr(VI) exposure effects in rat and mouse small intestinal epithelia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kopec, Anna K.; Thompson, Chad M.; Kim, Suntae

    2012-07-15

    Continuous exposure to high concentrations of hexavalent chromium [Cr(VI)] in drinking water results in intestinal tumors in mice but not rats. Concentration-dependent gene expression effects were evaluated in female F344 rat duodenal and jejunal epithelia following 7 and 90 days of exposure to 0.3–520 mg/L (as sodium dichromate dihydrate, SDD) in drinking water. Whole-genome microarrays identified 3269 and 1815 duodenal, and 4557 and 1534 jejunal differentially expressed genes at 8 and 91 days, respectively, with significant overlaps between the intestinal segments. Functional annotation identified gene expression changes associated with oxidative stress, cell cycle, cell death, and immune response that weremore » consistent with reported changes in redox status and histopathology. Comparative analysis with B6C3F1 mouse data from a similarly designed study identified 2790 differentially expressed rat orthologs in the duodenum compared to 5013 mouse orthologs at day 8, and only 1504 rat and 3484 mouse orthologs at day 91. Automated dose–response modeling resulted in similar median EC{sub 50}s in the rodent duodenal and jejunal mucosae. Comparative examination of differentially expressed genes also identified divergently regulated orthologs. Comparable numbers of differentially expressed genes were observed at equivalent Cr concentrations (μg Cr/g duodenum). However, mice accumulated higher Cr levels than rats at ≥ 170 mg/L SDD, resulting in a ∼ 2-fold increase in the number of differentially expressed genes. These qualitative and quantitative differences in differential gene expression, which correlate with differences in tissue dose, likely contribute to the disparate intestinal tumor outcomes. -- Highlights: ► Cr(VI) elicits dose-dependent changes in gene expression in rat intestine. ► Cr(VI) elicits less differential gene expression in rats compared to mice. ► Cr(VI) gene expression can be phenotypically anchored to intestinal changes.

  8. Normative Databases for Imaging Instrumentation.

    PubMed

    Realini, Tony; Zangwill, Linda M; Flanagan, John G; Garway-Heath, David; Patella, Vincent M; Johnson, Chris A; Artes, Paul H; Gaddie, Ian B; Fingeret, Murray

    2015-08-01

    To describe the process by which imaging devices undergo reference database development and regulatory clearance. The limitations and potential improvements of reference (normative) data sets for ophthalmic imaging devices will be discussed. A symposium was held in July 2013 in which a series of speakers discussed issues related to the development of reference databases for imaging devices. Automated imaging has become widely accepted and used in glaucoma management. The ability of such instruments to discriminate healthy from glaucomatous optic nerves, and to detect glaucomatous progression over time is limited by the quality of reference databases associated with the available commercial devices. In the absence of standardized rules governing the development of reference databases, each manufacturer's database differs in size, eligibility criteria, and ethnic make-up, among other key features. The process for development of imaging reference databases may be improved by standardizing eligibility requirements and data collection protocols. Such standardization may also improve the degree to which results may be compared between commercial instruments.

  9. Content based information retrieval in forensic image databases.

    PubMed

    Geradts, Zeno; Bijhold, Jurrien

    2002-03-01

    This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence.

  10. The Chicago Thoracic Oncology Database Consortium: A Multisite Database Initiative.

    PubMed

    Won, Brian; Carey, George B; Tan, Yi-Hung Carol; Bokhary, Ujala; Itkonen, Michelle; Szeto, Kyle; Wallace, James; Campbell, Nicholas; Hensing, Thomas; Salgia, Ravi

    2016-03-16

    An increasing amount of clinical data is available to biomedical researchers, but specifically designed database and informatics infrastructures are needed to handle this data effectively. Multiple research groups should be able to pool and share this data in an efficient manner. The Chicago Thoracic Oncology Database Consortium (CTODC) was created to standardize data collection and facilitate the pooling and sharing of data at institutions throughout Chicago and across the world. We assessed the CTODC by conducting a proof of principle investigation on lung cancer patients who took erlotinib. This study does not look into epidermal growth factor receptor (EGFR) mutations and tyrosine kinase inhibitors, but rather it discusses the development and utilization of the database involved.  We have implemented the Thoracic Oncology Program Database Project (TOPDP) Microsoft Access, the Thoracic Oncology Research Program (TORP) Velos, and the TORP REDCap databases for translational research efforts. Standard operating procedures (SOPs) were created to document the construction and proper utilization of these databases. These SOPs have been made available freely to other institutions that have implemented their own databases patterned on these SOPs. A cohort of 373 lung cancer patients who took erlotinib was identified. The EGFR mutation statuses of patients were analyzed. Out of the 70 patients that were tested, 55 had mutations while 15 did not. In terms of overall survival and duration of treatment, the cohort demonstrated that EGFR-mutated patients had a longer duration of erlotinib treatment and longer overall survival compared to their EGFR wild-type counterparts who received erlotinib. The investigation successfully yielded data from all institutions of the CTODC. While the investigation identified challenges, such as the difficulty of data transfer and potential duplication of patient data, these issues can be resolved with greater cross-communication between

  11. Online drug databases: a new method to assess and compare inclusion of clinically relevant information.

    PubMed

    Silva, Cristina; Fresco, Paula; Monteiro, Joaquim; Rama, Ana Cristina Ribeiro

    2013-08-01

    Evidence-Based Practice requires health care decisions to be based on the best available evidence. The model "Information Mastery" proposes that clinicians should use sources of information that have previously evaluated relevance and validity, provided at the point of care. Drug databases (DB) allow easy and fast access to information and have the benefit of more frequent content updates. Relevant information, in the context of drug therapy, is that which supports safe and effective use of medicines. Accordingly, the European Guideline on the Summary of Product Characteristics (EG-SmPC) was used as a standard to evaluate the inclusion of relevant information contents in DB. To develop and test a method to evaluate relevancy of DB contents, by assessing the inclusion of information items deemed relevant for effective and safe drug use. Hierarchical organisation and selection of the principles defined in the EGSmPC; definition of criteria to assess inclusion of selected information items; creation of a categorisation and quantification system that allows score calculation; calculation of relative differences (RD) of scores for comparison with an "ideal" database, defined as the one that achieves the best quantification possible for each of the information items; pilot test on a sample of 9 drug databases, using 10 drugs frequently associated in literature with morbidity-mortality and also being widely consumed in Portugal. Main outcome measure Calculate individual and global scores for clinically relevant information items of drug monographs in databases, using the categorisation and quantification system created. A--Method development: selection of sections, subsections, relevant information items and corresponding requisites; system to categorise and quantify their inclusion; score and RD calculation procedure. B--Pilot test: calculated scores for the 9 databases; globally, all databases evaluated significantly differed from the "ideal" database; some DB performed

  12. Toxicogenomics: the challenges and opportunities to identify biomarkers, signatures and thresholds to support mode-of-action.

    PubMed

    Currie, Richard A

    2012-08-15

    Toxicogenomics (TGx) can be defined as the application of "omics" techniques to toxicology and risk assessment. By identifying molecular changes associated with toxicity, TGx data might assist hazard identification and investigate causes. Early technical challenges were evaluated and addressed by consortia (e.g. ISLI/HESI and the Microarray Quality Control consortium), which demonstrated that TGx gave reliable and reproducible information. The MAQC also produced "best practice on signature generation" after conducting an extensive evaluation of different methods on common datasets. Two findings of note were the need for methods that control batch variability, and that the predictive ability of a signature changes in concert with the variability of the endpoint. The key challenge remaining is data interpretation, because TGx can identify molecular changes that are causal, associated with or incidental to toxicity. Application of Bradford Hill's tests for causation, which are used to build mode of action (MOA) arguments, can produce reasonable hypotheses linking altered pathways to phenotypic changes. However, challenges in interpretation still remain: are all pathway changes equal, which are most important and plausibly linked to toxicity? Therefore the expert judgement of the toxicologist is still needed. There are theoretical reasons why consistent alterations across a metabolic pathway are important, but similar changes in signalling pathways may not alter information flow. At the molecular level thresholds may be due to the inherent properties of the regulatory network, for example switch-like behaviours from some network motifs (e.g. positive feedback) in the perturbed pathway leading to the toxicity. The application of systems biology methods to TGx data can generate hypotheses that explain why a threshold response exists. However, are we adequately trained to make these judgments? There is a need for collaborative efforts between regulators, industry and

  13. Analysis of Lunar Highland Regolith Samples From Apollo 16 Drive Core 64001/2 and Lunar Regolith Simulants - an Expanding Comparative Database

    NASA Technical Reports Server (NTRS)

    Schrader, Christian M.; Rickman, Doug; Stoeser, Douglas; Wentworth, Susan; McKay, Dave S.; Botha, Pieter; Butcher, Alan R.; Horsch, Hanna E.; Benedictus, Aukje; Gottlieb, Paul

    2008-01-01

    This slide presentation reviews the work to analyze the lunar highland regolith samples that came from the Apollo 16 core sample 64001/2 and simulants of lunar regolith, and build a comparative database. The work is part of a larger effort to compile an internally consistent database on lunar regolith (Apollo Samples) and lunar regolith simulants. This is in support of a future lunar outpost. The work is to characterize existing lunar regolith and simulants in terms of particle type, particle size distribution, particle shape distribution, bulk density, and other compositional characteristics, and to evaluate the regolith simulants by the same properties in comparison to the Apollo sample lunar regolith.

  14. MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

    PubMed

    Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

    2016-01-04

    Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Evaluation of Database Coverage: A Comparison of Two Methodologies.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1982-01-01

    Describes experiment which compared two techniques used for evaluating and comparing database coverage of a subject area, e.g., "bibliography" and "subject profile." Differences in time, cost, and results achieved are compared by applying techniques to field of volcanology using two databases, Geological Reference File and GeoArchive. Twenty…

  16. The EMBL nucleotide sequence database

    PubMed Central

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

    2001-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039

  17. Normative Databases for Imaging Instrumentation

    PubMed Central

    Realini, Tony; Zangwill, Linda; Flanagan, John; Garway-Heath, David; Patella, Vincent Michael; Johnson, Chris; Artes, Paul; Ben Gaddie, I.; Fingeret, Murray

    2015-01-01

    Purpose To describe the process by which imaging devices undergo reference database development and regulatory clearance. The limitations and potential improvements of reference (normative) data sets for ophthalmic imaging devices will be discussed. Methods A symposium was held in July 2013 in which a series of speakers discussed issues related to the development of reference databases for imaging devices. Results Automated imaging has become widely accepted and used in glaucoma management. The ability of such instruments to discriminate healthy from glaucomatous optic nerves, and to detect glaucomatous progression over time is limited by the quality of reference databases associated with the available commercial devices. In the absence of standardized rules governing the development of reference databases, each manufacturer’s database differs in size, eligibility criteria, and ethnic make-up, among other key features. Conclusions The process for development of imaging reference databases may be improved by standardizing eligibility requirements and data collection protocols. Such standardization may also improve the degree to which results may be compared between commercial instruments. PMID:25265003

  18. US FDA and USA EPA Voluntary Submission of Genomic Data Guidance: Current and Future Use of Genomics in Decision Making

    EPA Science Inventory

    Appropriate utilization of data from toxicogenomic studies ins an ongoing concern of the regulated industries and the agencies charged with assessing safety or risk. An area of current interest is the possibility of toxicogenomics to enhance our ability to develop higher or high-...

  19. Digital Dental X-ray Database for Caries Screening

    NASA Astrophysics Data System (ADS)

    Rad, Abdolvahab Ehsani; Rahim, Mohd Shafry Mohd; Rehman, Amjad; Saba, Tanzila

    2016-06-01

    Standard database is the essential requirement to compare the performance of image analysis techniques. Hence the main issue in dental image analysis is the lack of available image database which is provided in this paper. Periapical dental X-ray images which are suitable for any analysis and approved by many dental experts are collected. This type of dental radiograph imaging is common and inexpensive, which is normally used for dental disease diagnosis and abnormalities detection. Database contains 120 various Periapical X-ray images from top to bottom jaw. Dental digital database is constructed to provide the source for researchers to use and compare the image analysis techniques and improve or manipulate the performance of each technique.

  20. Cross-Border Use of Food Databases: Equivalence of US and Australian Databases for Macronutrients

    PubMed Central

    Summer, Suzanne S.; Ollberding, Nicholas J.; Guy, Trish; Setchell, Kenneth D. R.; Brown, Nadine; Kalkwarf, Heidi J.

    2013-01-01

    When estimating dietary intake across multiple countries, the lack of a single comprehensive dietary database may lead researchers to modify one database to analyze intakes for all participants. This approach may yield results different from those using the country-specific database and introduce measurement error. We examined whether nutrient intakes of Australians calculated with a modified US database would be similar to those calculated with an Australian database. We analyzed 3-day food records of 68 Australian adults using the US-based Nutrition Data System for Research, modified to reflect food items consumed in Australia. Modification entailed identifying a substitute food whose energy and macronutrient content were within 10% of the Australian food or by adding a new food to the database. Paired Wilcoxon signed rank tests were used to compare differences in nutrient intakes estimated by both databases, and Pearson and intraclass correlation coefficients measured degree of association and agreement between intake estimates for individuals. Median intakes of energy, carbohydrate, protein, and fiber differed by <5% at the group level. Larger discrepancies were seen for fat (11%; P<0.0001) and most micronutrients. Despite strong correlations, nutrient intakes differed by >10% for an appreciable percentage of participants (35% for energy to 69% for total fat). Adding country-specific food items to an existing database resulted in similar overall macronutrient intake estimates but was insufficient for estimating individual intakes. When analyzing nutrient intakes in multinational studies, greater standardization and modification of databases may be required to more accurately estimate intake of individuals. PMID:23871108

  1. The Chicago Thoracic Oncology Database Consortium: A Multisite Database Initiative

    PubMed Central

    Carey, George B; Tan, Yi-Hung Carol; Bokhary, Ujala; Itkonen, Michelle; Szeto, Kyle; Wallace, James; Campbell, Nicholas; Hensing, Thomas; Salgia, Ravi

    2016-01-01

    Objective: An increasing amount of clinical data is available to biomedical researchers, but specifically designed database and informatics infrastructures are needed to handle this data effectively. Multiple research groups should be able to pool and share this data in an efficient manner. The Chicago Thoracic Oncology Database Consortium (CTODC) was created to standardize data collection and facilitate the pooling and sharing of data at institutions throughout Chicago and across the world. We assessed the CTODC by conducting a proof of principle investigation on lung cancer patients who took erlotinib. This study does not look into epidermal growth factor receptor (EGFR) mutations and tyrosine kinase inhibitors, but rather it discusses the development and utilization of the database involved. Methods:  We have implemented the Thoracic Oncology Program Database Project (TOPDP) Microsoft Access, the Thoracic Oncology Research Program (TORP) Velos, and the TORP REDCap databases for translational research efforts. Standard operating procedures (SOPs) were created to document the construction and proper utilization of these databases. These SOPs have been made available freely to other institutions that have implemented their own databases patterned on these SOPs. Results: A cohort of 373 lung cancer patients who took erlotinib was identified. The EGFR mutation statuses of patients were analyzed. Out of the 70 patients that were tested, 55 had mutations while 15 did not. In terms of overall survival and duration of treatment, the cohort demonstrated that EGFR-mutated patients had a longer duration of erlotinib treatment and longer overall survival compared to their EGFR wild-type counterparts who received erlotinib. Discussion: The investigation successfully yielded data from all institutions of the CTODC. While the investigation identified challenges, such as the difficulty of data transfer and potential duplication of patient data, these issues can be resolved

  2. Learning Curve Assessment of Robot-Assisted Radical Prostatectomy Compared with Open-Surgery Controls from the Premier Perspective Database

    PubMed Central

    Kreaden, Usha S.; Gabbert, Jessica; Thomas, Raju

    2014-01-01

    Abstract Introduction: The primary aims of this study were to assess the learning curve effect of robot-assisted radical prostatectomy (RARP) in a large administrative database consisting of multiple U.S. hospitals and surgeons, and to compare the results of RARP with open radical prostatectomy (ORP) from the same settings. Materials and Methods: The patient population of study was from the Premier Perspective Database (Premier, Inc., Charlotte, NC) and consisted of 71,312 radical prostatectomies performed at more than 300 U.S. hospitals by up to 3739 surgeons by open or robotic techniques from 2004 to 2010. The key endpoints were surgery time, inpatient length of stay, and overall complications. We compared open versus robotic, results by year of procedures, results by case volume of specific surgeons, and results of open surgery in hospitals with and without a robotic system. Results: The mean surgery time was longer for RARP (4.4 hours, standard deviation [SD] 1.7) compared with ORP (3.4 hours, SD 1.5) in the same hospitals (p<0.0001). Inpatient stay was shorter for RARP (2.2 days, SD 1.9) compared with ORP (3.2 days, SD 2.7) in the same hospitals (p<0.0001). The overall complications were less for RARP (10.6%) compared with ORP (15.8%) in the same hospitals, as were transfusion rates. ORP results in hospitals without a robot were not better than ORP with a robot, and pretreatment co-morbidity profiles were similar in all cohorts. Trending of results by year of procedure showed no differences in the three cohorts, but trending of RARP results by surgeon experience showed improvements in surgery time, hospital stay, conversion rates, and complication rates. Conclusions: During the initial 7 years of RARP development, outcomes showed decreased hospital stay, complications, and transfusion rates. Learning curve trends for RARP were evident for these endpoints when grouped by surgeon experience, but not by year of surgery. PMID:24350787

  3. LoopX: A Graphical User Interface-Based Database for Comprehensive Analysis and Comparative Evaluation of Loops from Protein Structures.

    PubMed

    Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna

    2017-10-01

    Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.

  4. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    PubMed

    Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G; Parkhill, Julian; Rajandream, Marie-Adèle

    2008-12-01

    Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/

  5. Signal Detection of Imipenem Compared to Other Drugs from Korea Adverse Event Reporting System Database

    PubMed Central

    Park, Kyounghoon; Soukavong, Mick; Kim, Jungmee; Kwon, Kyoung-eun; Jin, Xue-mei; Lee, Joongyub; Yang, Bo Ram

    2017-01-01

    Purpose To detect signals of adverse drug events after imipenem treatment using the Korea Institute of Drug Safety & Risk Management-Korea adverse event reporting system database (KIDS-KD). Materials and Methods We performed data mining using KIDS-KD, which was constructed using spontaneously reported adverse event (AE) reports between December 1988 and June 2014. We detected signals calculated the proportional reporting ratio, reporting odds ratio, and information component of imipenem. We defined a signal as any AE that satisfied all three indices. The signals were compared with drug labels of nine countries. Results There were 807582 spontaneous AEs reports in the KIDS-KD. Among those, the number of antibiotics related AEs was 192510; 3382 reports were associated with imipenem. The most common imipenem-associated AE was the drug eruption; 353 times. We calculated the signal by comparing with all other antibiotics and drugs; 58 and 53 signals satisfied the three methods. We compared the drug labelling information of nine countries, including the USA, the UK, Japan, Italy, Switzerland, Germany, France, Canada, and South Korea, and discovered that the following signals were currently not included in drug labels: hypokalemia, cardiac arrest, cardiac failure, Parkinson's syndrome, myocardial infarction, and prostate enlargement. Hypokalemia was an additional signal compared with all other antibiotics, and the other signals were not different compared with all other antibiotics and all other drugs. Conclusion We detected new signals that were not listed on the drug labels of nine countries. However, further pharmacoepidemiologic research is needed to evaluate the causality of these signals. PMID:28332362

  6. Signal Detection of Imipenem Compared to Other Drugs from Korea Adverse Event Reporting System Database.

    PubMed

    Park, Kyounghoon; Soukavong, Mick; Kim, Jungmee; Kwon, Kyoung Eun; Jin, Xue Mei; Lee, Joongyub; Yang, Bo Ram; Park, Byung Joo

    2017-05-01

    To detect signals of adverse drug events after imipenem treatment using the Korea Institute of Drug Safety & Risk Management-Korea adverse event reporting system database (KIDS-KD). We performed data mining using KIDS-KD, which was constructed using spontaneously reported adverse event (AE) reports between December 1988 and June 2014. We detected signals calculated the proportional reporting ratio, reporting odds ratio, and information component of imipenem. We defined a signal as any AE that satisfied all three indices. The signals were compared with drug labels of nine countries. There were 807582 spontaneous AEs reports in the KIDS-KD. Among those, the number of antibiotics related AEs was 192510; 3382 reports were associated with imipenem. The most common imipenem-associated AE was the drug eruption; 353 times. We calculated the signal by comparing with all other antibiotics and drugs; 58 and 53 signals satisfied the three methods. We compared the drug labelling information of nine countries, including the USA, the UK, Japan, Italy, Switzerland, Germany, France, Canada, and South Korea, and discovered that the following signals were currently not included in drug labels: hypokalemia, cardiac arrest, cardiac failure, Parkinson's syndrome, myocardial infarction, and prostate enlargement. Hypokalemia was an additional signal compared with all other antibiotics, and the other signals were not different compared with all other antibiotics and all other drugs. We detected new signals that were not listed on the drug labels of nine countries. However, further pharmacoepidemiologic research is needed to evaluate the causality of these signals. © Copyright: Yonsei University College of Medicine 2017

  7. Toxicogenomic outcomes predictive of forestomach carcinogenesis following exposure to benzo(a)pyrene: Relevance to human cancer risk

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Labib, Sarah, E-mail: Sarah.Labib@hc-sc.gc.ca; Guo, Charles H., E-mail: Charles.Guo@hc-sc.gc.ca; Williams, Andrew, E-mail: Andrew.Williams@hc-sc.gc.ca

    2013-12-01

    Forestomach tumors are observed in mice exposed to environmental carcinogens. However, the relevance of this data to humans is controversial because humans lack a forestomach. We hypothesize that an understanding of early molecular changes after exposure to a carcinogen in the forestomach will provide mode-of-action information to evaluate the applicability of forestomach cancers to human cancer risk assessment. In the present study we exposed mice to benzo(a)pyrene (BaP), an environmental carcinogen commonly associated with tumors of the rodent forestomach. Toxicogenomic tools were used to profile gene expression response in the forestomach. Adult Muta™Mouse males were orally exposed to 25, 50,more » and 75 mg BaP/kg-body-weight/day for 28 consecutive days. The forestomach was collected three days post-exposure. DNA microarrays, real-time RT-qPCR arrays, and protein analyses were employed to characterize responses in the forestomach. Microarray results showed altered expression of 414 genes across all treatment groups (± 1.5 fold; false discovery rate adjusted P ≤ 0.05). Significant downregulation of genes associated with phase II xenobiotic metabolism and increased expression of genes implicated in antigen processing and presentation, immune response, chemotaxis, and keratinocyte differentiation were observed in treated groups in a dose-dependent manner. A systematic comparison of the differentially expressed genes in the forestomach from the present study to differentially expressed genes identified in human diseases including human gastrointestinal tract cancers using the NextBio Human Disease Atlas showed significant commonalities between the two models. Our results provide molecular evidence supporting the use of the mouse forestomach model to evaluate chemically-induced gastrointestinal carcinogenesis in humans. - Highlights: • Benzo(a)pyrene-mediated transcriptomic response in the forestomach was examined. • The immunoproteosome subunits and MHC

  8. Analysis of commercial and public bioactivity databases.

    PubMed

    Tiikkainen, Pekka; Franke, Lutz

    2012-02-27

    Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data.

  9. JICST Factual Database JICST DNA Database

    NASA Astrophysics Data System (ADS)

    Shirokizawa, Yoshiko; Abe, Atsushi

    Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.

  10. Creating databases for biological information: an introduction.

    PubMed

    Stein, Lincoln

    2013-06-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.

  11. Improved orthologous databases to ease protozoan targets inference.

    PubMed

    Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R

    2015-09-29

    Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13

  12. Trends in Outcomes and Hospitalization Charges of Infant Botulism in the United States: A Comparative Analysis Between Kids' Inpatient Database and National Inpatient Sample.

    PubMed

    Opila, Tamara; George, Asha; El-Ghanem, Mohammad; Souayah, Nizar

    2017-02-01

    New therapeutic strategies, including immune globulin intravenous, have emerged in the past two decades for the management of botulism. However, impact on outcomes and hospitalization charges among infants (aged ≤1 year) with botulism in the United States is unknown. We analyzed the Kids' Inpatient Database (KID) and National Inpatient Sample (NIS) for in-hospital outcomes and charges for infant botulism cases from 1997 to 2009. Demographics, discharge status, mortality, length of stay, and hospitalization charges were reported from the two databases and compared. Between 1997 and 2009, 504 infant hospitalizations were captured in KID', and 340 hospitalizations from NIS, for comparable years. A significant decrease was observed in mean length of stay for 'KID (P < 0.01); a similar decrease was observed for the NIS. The majority of patients were discharged to home. Despite an initial decrease after 1997, an increasing trend was observed for 'KID/NIS mean hospital charges from 2000 to 2009 (from $57,659/$56,309 to $143,171/$106,378; P < 0.001/P < 0.001). A linear increasing trend was evident when examining mean daily hospitalization charges for both databases. In conducting a subgroup analysis of the 'KID database, the youngest patients with infantile botulism (≤1.9 months) displayed the highest average number of procedures during their hospitalization (P < .001) and the highest rate of mechanical ventilation (P < .001), compared with their older counterparts. Infant botulism cases have demonstrated a significant increase in hospitalization charges over the years despite reduced length of stay. Additionally, there were significantly higher daily adjusted hospital charges and an increased rate of routine discharges for immune globulin intravenous-treated patients. More controlled studies are needed to define the criteria for cost-effective use of intravenous immune globulin in the population with infant botulism. Copyright © 2016 Elsevier Inc. All

  13. Building An Integrated Neurodegenerative Disease Database At An Academic Health Center

    PubMed Central

    Xie, Sharon X.; Baek, Young; Grossman, Murray; Arnold, Steven E.; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M.-Y.; Trojanowski, John Q.

    2010-01-01

    Background It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration (FTLD). These comparative studies rely on powerful database tools to quickly generate data sets which match diverse and complementary criteria set by the studies. Methods In this paper, we present a novel Integrated NeuroDegenerative Disease (INDD) database developed at the University of Pennsylvania (Penn) through a consortium of Penn investigators. Since these investigators work on AD, PD, ALS and FTLD, this allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used Microsoft SQL Server as the platform with built-in “backwards” functionality to provide Access as a front-end client to interface with the database. We used PHP hypertext Preprocessor to create the “front end” web interface and then integrated individual neurodegenerative disease databases using a master lookup table. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Results We compare the results of a biomarker study using the INDD database to those using an alternative approach by querying individual database separately. Conclusions We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies across several neurodegenerative diseases. PMID:21784346

  14. An Introduction to Database Structure and Database Machines.

    ERIC Educational Resources Information Center

    Detweiler, Karen

    1984-01-01

    Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…

  15. Six Online Periodical Databases: A Librarian's View.

    ERIC Educational Resources Information Center

    Willems, Harry

    1999-01-01

    Compares the following World Wide Web-based periodical databases, focusing on their usefulness in K-12 school libraries: EBSCO, Electric Library, Facts on File, SIRS, Wilson, and UMI. Search interfaces, display options, help screens, printing, home access, copyright restrictions, database administration, and making a decision are discussed. A…

  16. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    PubMed Central

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  17. A database for the analysis of immunity genes in Drosophila: PADMA database.

    PubMed

    Lee, Mark J; Mondal, Ariful; Small, Chiyedza; Paddibhatla, Indira; Kawaguchi, Akira; Govind, Shubha

    2011-01-01

    While microarray experiments generate voluminous data, discerning trends that support an existing or alternative paradigm is challenging. To synergize hypothesis building and testing, we designed the Pathogen Associated Drosophila MicroArray (PADMA) database for easy retrieval and comparison of microarray results from immunity-related experiments (www.padmadatabase.org). PADMA also allows biologists to upload their microarray-results and compare it with datasets housed within PADMA. We tested PADMA using a preliminary dataset from Ganaspis xanthopoda-infected fly larvae, and uncovered unexpected trends in gene expression, reshaping our hypothesis. Thus, the PADMA database will be a useful resource to fly researchers to evaluate, revise, and refine hypotheses.

  18. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  19. Recent updates and developments to plant genome size databases

    PubMed Central

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  20. Privacy-preserving search for chemical compound databases.

    PubMed

    Shimizu, Kana; Nuida, Koji; Arai, Hiromi; Mitsunari, Shigeo; Attrapadung, Nuttapong; Hamada, Michiaki; Tsuda, Koji; Hirokawa, Takatsugu; Sakuma, Jun; Hanaoka, Goichiro; Asai, Kiyoshi

    2015-01-01

    Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

  1. Privacy-preserving search for chemical compound databases

    PubMed Central

    2015-01-01

    Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650

  2. Database Support for Research in Public Administration

    ERIC Educational Resources Information Center

    Tucker, James Cory

    2005-01-01

    This study examines the extent to which databases support student and faculty research in the area of public administration. A list of journals in public administration, public policy, political science, public budgeting and finance, and other related areas was compared to the journal content list of six business databases. These databases…

  3. The Sequenced Angiosperm Genomes and Genome Databases.

    PubMed

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  4. The Sequenced Angiosperm Genomes and Genome Databases

    PubMed Central

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology. PMID:29706973

  5. Building an integrated neurodegenerative disease database at an academic health center.

    PubMed

    Xie, Sharon X; Baek, Young; Grossman, Murray; Arnold, Steven E; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M-Y; Trojanowski, John Q

    2011-07-01

    It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration. These comparative studies rely on powerful database tools to quickly generate data sets that match diverse and complementary criteria set by them. In this article, we present a novel integrated neurodegenerative disease (INDD) database, which was developed at the University of Pennsylvania (Penn) with the help of a consortium of Penn investigators. Because the work of these investigators are based on Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration, it allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used the Microsoft SQL server as a platform, with built-in "backwards" functionality to provide Access as a frontend client to interface with the database. We used PHP Hypertext Preprocessor to create the "frontend" web interface and then used a master lookup table to integrate individual neurodegenerative disease databases. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Using the INDD database, we compared the results of a biomarker study with those using an alternative approach by querying individual databases separately. We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies on several neurodegenerative diseases. Copyright © 2011 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  6. Creating databases for biological information: an introduction.

    PubMed

    Stein, Lincoln

    2002-08-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.

  7. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.

  8. Weathering Database Technology

    ERIC Educational Resources Information Center

    Snyder, Robert

    2005-01-01

    Collecting weather data is a traditional part of a meteorology unit at the middle level. However, making connections between the data and weather conditions can be a challenge. One way to make these connections clearer is to enter the data into a database. This allows students to quickly compare different fields of data and recognize which…

  9. Academic Journal Embargoes and Full Text Databases.

    ERIC Educational Resources Information Center

    Brooks, Sam

    2003-01-01

    Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…

  10. Comparing Pattern Recognition Feature Sets for Sorting Triples in the FIRST Database

    NASA Astrophysics Data System (ADS)

    Proctor, D. D.

    2006-07-01

    Pattern recognition techniques have been used with increasing success for coping with the tremendous amounts of data being generated by automated surveys. Usually this process involves construction of training sets, the typical examples of data with known classifications. Given a feature set, along with the training set, statistical methods can be employed to generate a classifier. The classifier is then applied to process the remaining data. Feature set selection, however, is still an issue. This paper presents techniques developed for accommodating data for which a substantive portion of the training set cannot be classified unambiguously, a typical case for low-resolution data. Significance tests on the sort-ordered, sample-size-normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. The technique is applied to comparing feature sets for sorting a particular radio galaxy morphology, bent-doubles, from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) database. Also examined are alternative functional forms for feature sets. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The technique also may be applied to situations for which, although accurate classifications are available, the feature set is clearly inadequate, but is desired nonetheless to make the best of available information.

  11. Toxicogenomics in the 3T3-L1 cell line, a new approach for screening of obesogenic compounds.

    PubMed

    Pereira-Fernandes, Anna; Vanparys, Caroline; Vergauwen, Lucia; Knapen, Dries; Jorens, Philippe Germaines; Blust, Ronny

    2014-08-01

    The obesogen hypothesis states that together with an energy imbalance between calories consumed and calories expended, exposure to environmental compounds early in life or throughout lifetime might have an influence on obesity development. In this work, we propose a new approach for obesogen screening, i.e., the use of transcriptomics in the 3T3-L1 pre-adipocyte cell line. Based on the data from a previous study of our group using a lipid accumulation based adipocyte differentiation assay, several human-relevant obesogenic compounds were selected: reference obesogens (Rosiglitazone, Tributyltin), test obesogens (Butylbenzyl phthalate, butylparaben, propylparaben, Bisphenol A), and non-obesogens (Ethylene Brassylate, Bis (2-ethylhexyl)phthalate). The high stability and reproducibility of the 3T3-L1 gene transcription patterns over different experiments and cell batches is demonstrated by this study. Obesogens and non-obesogen gene transcription profiles were clearly distinguished using hierarchical clustering. Furthermore, a gradual distinction corresponding to differences in induction of lipid accumulation could be made between test and reference obesogens based on transcription patterns, indicating the potential use of this strategy for classification of obesogens. Marker genes that are able to distinguish between non, test, and reference obesogens were identified. Well-known genes involved in adipocyte differentiation as well as genes with unknown functions were selected, implying a potential adipocyte-related function of the latter. Cell-physiological lipid accumulation was well estimated based on transcription levels of the marker genes, indicating the biological relevance of omics data. In conclusion, this study shows the high relevance and reproducibility of this 3T3-L1 based in vitro toxicogenomics tool for classification of obesogens and biomarker discovery. Although the results presented here are promising, further confirmation of the predictive value of

  12. Searching the Cambridge Structural Database for polymorphs.

    PubMed

    van de Streek, Jacco; Motherwell, Sam

    2005-10-01

    In order to identify all pairs of polymorphs in the Cambridge Structural Database (CSD), a method was devised to automatically compare two crystal structures. The comparison is based on simulated powder diffraction patterns, but with special provisions to deal with differences in unit-cell volumes caused by temperature or pressure. Among the 325,000 crystal structures in the Cambridge Structural Database, 35,000 pairs of crystal structures of the same chemical compound were identified and compared. A total of 7300 pairs of polymorphs were identified, of which 154 previously were unknown.

  13. Database of Novel and Emerging Adsorbent Materials

    National Institute of Standards and Technology Data Gateway

    SRD 205 NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials (Web, free access)   The NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials is a free, web-based catalog of adsorbent materials and measured adsorption properties of numerous materials obtained from article entries from the scientific literature. Search fields for the database include adsorbent material, adsorbate gas, experimental conditions (pressure, temperature), and bibliographic information (author, title, journal), and results from queries are provided as a list of articles matching the search parameters. The database also contains adsorption isotherms digitized from the cataloged articles, which can be compared visually online in the web application or exported for offline analysis.

  14. Pathway Analysis Revealed Potential Diverse Health Impacts of Flavonoids that Bind Estrogen Receptors

    PubMed Central

    Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Ge, Weigong; Perkins, Roger; Tong, Weida; Hong, Huixiao

    2016-01-01

    Flavonoids are frequently used as dietary supplements in the absence of research evidence regarding health benefits or toxicity. Furthermore, ingested doses could far exceed those received from diet in the course of normal living. Some flavonoids exhibit binding to estrogen receptors (ERs) with consequential vigilance by regulatory authorities at the U.S. EPA and FDA. Regulatory authorities must consider both beneficial claims and potential adverse effects, warranting the increases in research that has spanned almost two decades. Here, we report pathway enrichment of 14 targets from the Comparative Toxicogenomics Database (CTD) and the Herbal Ingredients’ Targets (HIT) database for 22 flavonoids that bind ERs. The selected flavonoids are confirmed ER binders from our earlier studies, and were here found in mainly involved in three types of biological processes, ER regulation, estrogen metabolism and synthesis, and apoptosis. Besides cancers, we conjecture that the flavonoids may affect several diseases via apoptosis pathways. Diseases such as amyotrophic lateral sclerosis, viral myocarditis and non-alcoholic fatty liver disease could be implicated. More generally, apoptosis processes may be importantly evolved biological functions of flavonoids that bind ERs and high dose ingestion of those flavonoids could adversely disrupt the cellular apoptosis process. PMID:27023590

  15. Disease model curation improvements at Mouse Genome Informatics

    PubMed Central

    Bello, Susan M.; Richardson, Joel E.; Davis, Allan P.; Wiegers, Thomas C.; Mattingly, Carolyn J.; Dolan, Mary E.; Smith, Cynthia L.; Blake, Judith A.; Eppig, Janan T.

    2012-01-01

    Optimal curation of human diseases requires an ontology or structured vocabulary that contains terms familiar to end users, is robust enough to support multiple levels of annotation granularity, is limited to disease terms and is stable enough to avoid extensive reannotation following updates. At Mouse Genome Informatics (MGI), we currently use disease terms from Online Mendelian Inheritance in Man (OMIM) to curate mouse models of human disease. While OMIM provides highly detailed disease records that are familiar to many in the medical community, it lacks structure to support multilevel annotation. To improve disease annotation at MGI, we evaluated the merged Medical Subject Headings (MeSH) and OMIM disease vocabulary created by the Comparative Toxicogenomics Database (CTD) project. Overlaying MeSH onto OMIM provides hierarchical access to broad disease terms, a feature missing from the OMIM. We created an extended version of the vocabulary to meet the genetic disease-specific curation needs at MGI. Here we describe our evaluation of the CTD application, the extensions made by MGI and discuss the strengths and weaknesses of this approach. Database URL: http://www.informatics.jax.org/ PMID:22434831

  16. Toward unification of taxonomy databases in a distributed computer environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi

    1994-12-31

    All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less

  17. An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nicholas; Sellis, Timos

    1994-01-01

    We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.

  18. Time Trends of Period Prevalence Rates of Patients with Inhaled Long-Acting Beta-2-Agonists-Containing Prescriptions: A European Comparative Database Study

    PubMed Central

    Rottenkolber, Marietta; Voogd, Eef; van Dijk, Liset; Primatesta, Paola; Becker, Claudia; Schlienger, Raymond; de Groot, Mark C. H.; Alvarez, Yolanda; Durand, Julie; Slattery, Jim; Afonso, Ana; Requena, Gema; Gil, Miguel; Alvarez, Arturo; Hesse, Ulrik; Gerlach, Roman; Hasford, Joerg; Fischer, Rainald; Klungel, Olaf H.; Schmiedl, Sven

    2015-01-01

    Background Inhaled, long-acting beta-2-adrenoceptor agonists (LABA) have well-established roles in asthma and/or COPD treatment. Drug utilisation patterns for LABA have been described, but few studies have directly compared LABA use in different countries. We aimed to compare the prevalence of LABA-containing prescriptions in five European countries using a standardised methodology. Methods A common study protocol was applied to seven European healthcare record databases (Denmark, Germany, Spain, the Netherlands (2), and the UK (2)) to calculate crude and age- and sex-standardised annual period prevalence rates (PPRs) of LABA-containing prescriptions from 2002–2009. Annual PPRs were stratified by sex, age, and indication (asthma, COPD, asthma and COPD). Results From 2002–2009, age- and sex-standardised PPRs of patients with LABA-containing medications increased in all databases (58.2%–185.1%). Highest PPRs were found in men ≥ 80 years old and women 70–79 years old. Regarding the three indications, the highest age- and sex-standardised PPRs in all databases were found in patients with “asthma and COPD” but with large inter-country variation. In those with asthma or COPD, lower PPRs and smaller inter-country variations were found. For all three indications, PPRs for LABA-containing prescriptions increased with age. Conclusions Using a standardised protocol that allowed direct inter-country comparisons, we found highest rates of LABA-containing prescriptions in elderly patients and distinct differences in the increased utilisation of LABA-containing prescriptions within the study period throughout the five European countries. PMID:25706152

  19. ocsESTdb: a database of oil crop seed EST sequences for comparative analysis and investigation of a global metabolic network and oil accumulation metabolism.

    PubMed

    Ke, Tao; Yu, Jingyin; Dong, Caihua; Mao, Han; Hua, Wei; Liu, Shengyi

    2015-01-21

    Oil crop seeds are important sources of fatty acids (FAs) for human and animal nutrition. Despite their importance, there is a lack of an essential bioinformatics resource on gene transcription of oil crops from a comparative perspective. In this study, we developed ocsESTdb, the first database of expressed sequence tag (EST) information on seeds of four large-scale oil crops with an emphasis on global metabolic networks and oil accumulation metabolism that target the involved unigenes. A total of 248,522 ESTs and 106,835 unigenes were collected from the cDNA libraries of rapeseed (Brassica napus), soybean (Glycine max), sesame (Sesamum indicum) and peanut (Arachis hypogaea). These unigenes were annotated by a sequence similarity search against databases including TAIR, NR protein database, Gene Ontology, COG, Swiss-Prot, TrEMBL and Kyoto Encyclopedia of Genes and Genomes (KEGG). Five genome-scale metabolic networks that contain different numbers of metabolites and gene-enzyme reaction-association entries were analysed and constructed using Cytoscape and yEd programs. Details of unigene entries, deduced amino acid sequences and putative annotation are available from our database to browse, search and download. Intuitive and graphical representations of EST/unigene sequences, functional annotations, metabolic pathways and metabolic networks are also available. ocsESTdb will be updated regularly and can be freely accessed at http://ocri-genomics.org/ocsESTdb/ . ocsESTdb may serve as a valuable and unique resource for comparative analysis of acyl lipid synthesis and metabolism in oilseed plants. It also may provide vital insights into improving oil content in seeds of oil crop species by transcriptional reconstruction of the metabolic network.

  20. PIGD: a database for intronless genes in the Poaceae.

    PubMed

    Yan, Hanwei; Jiang, Cuiping; Li, Xiaoyu; Sheng, Lei; Dong, Qing; Peng, Xiaojian; Li, Qian; Zhao, Yang; Jiang, Haiyang; Cheng, Beijiu

    2014-10-01

    Intronless genes are a feature of prokaryotes; however, they are widespread and unequally distributed among eukaryotes and represent an important resource to study the evolution of gene architecture. Although many databases on exons and introns exist, there is currently no cohesive database that collects intronless genes in plants into a single database. In this study, we present the Poaceae Intronless Genes Database (PIGD), a user-friendly web interface to explore information on intronless genes from different plants. Five Poaceae species, Sorghum bicolor, Zea mays, Setaria italica, Panicum virgatum and Brachypodium distachyon, are included in the current release of PIGD. Gene annotations and sequence data were collected and integrated from different databases. The primary focus of this study was to provide gene descriptions and gene product records. In addition, functional annotations, subcellular localization prediction and taxonomic distribution are reported. PIGD allows users to readily browse, search and download data. BLAST and comparative analyses are also provided through this online database, which is available at http://pigd.ahau.edu.cn/. PIGD provides a solid platform for the collection, integration and analysis of intronless genes in the Poaceae. As such, this database will be useful for subsequent bio-computational analysis in comparative genomics and evolutionary studies.

  1. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  2. The reference ballistic imaging database revisited.

    PubMed

    De Ceuster, Jan; Dujardin, Sylvain

    2015-03-01

    A reference ballistic image database (RBID) contains images of cartridge cases fired in firearms that are in circulation: a ballistic fingerprint database. The performance of an RBID was investigated a decade ago by De Kinder et al. using IBIS(®) Heritage™ technology. The results of that study were published in this journal, issue 214. Since then, technologies have evolved quite significantly and novel apparatus have become available on the market. The current research article investigates the efficiency of another automated ballistic imaging system, Evofinder(®) using the same database as used by De Kinder et al. The results demonstrate a significant increase in correlation efficiency: 38% of all matches were on first position of the Evofinder correlation list in comparison to IBIS(®) Heritage™ where only 19% were on the first position. Average correlation times are comparable to the IBIS(®) Heritage™ system. While Evofinder(®) demonstrates specific improvement for mutually correlating different ammunition brands, ammunition dependence of the markings is still strongly influencing the correlation result because the markings may vary considerably. As a consequence a great deal of potential hits (36%) was still far down in the correlation lists (positions 31 and lower). The large database was used to examine the probability of finding a match as a function of correlation list verification. As an example, the RBID study on Evofinder(®) demonstrates that to find at least 90% of all potential matches, at least 43% of the items in the database need to be compared on screen and this for breech face markings and firing pin impression separately. These results, although a clear improvement to the original RBID study, indicate that the implementation of such a database should still not be considered nowadays. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Risks associated with clinical databases.

    PubMed

    Eleazar, P Y

    1991-11-01

    Providers will succeed who are evaluating themselves, and who are continuously striving to examine who they are and where they are going. Conscientious providers know that countless other agencies have them under the microscope and that they have to work to stay ahead in assessing their actions through their clinical database. "Medical care value purchasing" is what every employer and payor is looking for, and providers need to find ways to illustrate cost in relation to quality. The basics of data security and protection should be in place in order to concentrate on the bigger picture. The knowledge of the risk associated with individual hospital databases as well as the risk associated with comparative databases is critical. The hospital-level clinical database is the hub of the wheel. If the risk there can be minimized, the data headed for various investigative sites will have less inherent risk. When it is really recognized and accepted that all financial decisions are made based upon the clinical data generated at the site of care, then data integrity will become a strategic advantage for the industry. Clinical database goals will, over time, cause minimization of risk at all levels. As this occurs, variation in treatment will be explained artfully.

  4. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  5. The Israeli National Genetic database: a 10-year experience.

    PubMed

    Zlotogora, Joël; Patrinos, George P

    2017-03-16

    The Israeli National and Ethnic Mutation database ( http://server.goldenhelix.org/israeli ) was launched in September 2006 on the ETHNOS software to include clinically relevant genomic variants reported among Jewish and Arab Israeli patients. In 2016, the database was reviewed and corrected according to ClinVar ( https://www.ncbi.nlm.nih.gov/clinvar ) and ExAC ( http://exac.broadinstitute.org ) database entries. The present article summarizes some key aspects from the development and continuous update of the database over a 10-year period, which could serve as a paradigm of successful database curation for other similar resources. In September 2016, there were 2444 entries in the database, 890 among Jews, 1376 among Israeli Arabs, and 178 entries among Palestinian Arabs, corresponding to an ~4× data content increase compared to when originally launched. While the Israeli Arab population is much smaller than the Jewish population, the number of pathogenic variants causing recessive disorders reported in the database is higher among Arabs (934) than among Jews (648). Nevertheless, the number of pathogenic variants classified as founder mutations in the database is smaller among Arabs (175) than among Jews (192). In 2016, the entire database content was compared to that of other databases such as ClinVar and ExAC. We show that a significant difference in the percentage of pathogenic variants from the Israeli genetic database that were present in ExAC was observed between the Jewish population (31.8%) and the Israeli Arab population (20.6%). The Israeli genetic database was launched in 2006 on the ETHNOS software and is available online ever since. It allows querying the database according to the disorder and the ethnicity; however, many other features are not available, in particular the possibility to search according to the name of the gene. In addition, due to the technical limitations of the previous ETHNOS software, new features and data are not included in the

  6. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  7. A Database Practicum for Teaching Database Administration and Software Development at Regis University

    ERIC Educational Resources Information Center

    Mason, Robert T.

    2013-01-01

    This research paper compares a database practicum at the Regis University College for Professional Studies (CPS) with technology oriented practicums at other universities. Successful andragogy for technology courses can motivate students to develop a genuine interest in the subject, share their knowledge with peers and can inspire students to…

  8. Multi-Database Searching in the Behavioral Sciences--Part I: Basic Techniques and Core Databases.

    ERIC Educational Resources Information Center

    Angier, Jennifer J.; Epstein, Barbara A.

    1980-01-01

    Outlines practical searching techniques in seven core behavioral science databases accessing psychological literature: Psychological Abstracts, Social Science Citation Index, Biosis, Medline, Excerpta Medica, Sociological Abstracts, ERIC. Use of individual files is discussed and their relative strengths/weaknesses are compared. Appended is a list…

  9. Integrating Variances into an Analytical Database

    NASA Technical Reports Server (NTRS)

    Sanchez, Carlos

    2010-01-01

    For this project, I enrolled in numerous SATERN courses that taught the basics of database programming. These include: Basic Access 2007 Forms, Introduction to Database Systems, Overview of Database Design, and others. My main job was to create an analytical database that can handle many stored forms and make it easy to interpret and organize. Additionally, I helped improve an existing database and populate it with information. These databases were designed to be used with data from Safety Variances and DCR forms. The research consisted of analyzing the database and comparing the data to find out which entries were repeated the most. If an entry happened to be repeated several times in the database, that would mean that the rule or requirement targeted by that variance has been bypassed many times already and so the requirement may not really be needed, but rather should be changed to allow the variance's conditions permanently. This project did not only restrict itself to the design and development of the database system, but also worked on exporting the data from the database to a different format (e.g. Excel or Word) so it could be analyzed in a simpler fashion. Thanks to the change in format, the data was organized in a spreadsheet that made it possible to sort the data by categories or types and helped speed up searches. Once my work with the database was done, the records of variances could be arranged so that they were displayed in numerical order, or one could search for a specific document targeted by the variances and restrict the search to only include variances that modified a specific requirement. A great part that contributed to my learning was SATERN, NASA's resource for education. Thanks to the SATERN online courses I took over the summer, I was able to learn many new things about computers and databases and also go more in depth into topics I already knew about.

  10. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for

  11. The Brain Database: A Multimedia Neuroscience Database for Research and Teaching

    PubMed Central

    Wertheim, Steven L.

    1989-01-01

    The Brain Database is an information tool designed to aid in the integration of clinical and research results in neuroanatomy and regional biochemistry. It can handle a wide range of data types including natural images, 2 and 3-dimensional graphics, video, numeric data and text. It is organized around three main entities: structures, substances and processes. The database will support a wide variety of graphical interfaces. Two sample interfaces have been made. This tool is intended to serve as one component of a system that would allow neuroscientists and clinicians 1) to represent clinical and experimental data within a common framework 2) to compare results precisely between experiments and among laboratories, 3) to use computing tools as an aid in collaborative work and 4) to contribute to a shared and accessible body of knowledge about the nervous system.

  12. Comparative Evaluation of Registration Algorithms in Different Brain Databases With Varying Difficulty: Results and Insights

    PubMed Central

    Akbari, Hamed; Bilello, Michel; Da, Xiao; Davatzikos, Christos

    2015-01-01

    Evaluating various algorithms for the inter-subject registration of brain magnetic resonance images (MRI) is a necessary topic receiving growing attention. Existing studies evaluated image registration algorithms in specific tasks or using specific databases (e.g., only for skull-stripped images, only for single-site images, etc.). Consequently, the choice of registration algorithms seems task- and usage/parameter-dependent. Nevertheless, recent large-scale, often multi-institutional imaging-related studies create the need and raise the question whether some registration algorithms can 1) generally apply to various tasks/databases posing various challenges; 2) perform consistently well, and while doing so, 3) require minimal or ideally no parameter tuning. In seeking answers to this question, we evaluated 12 general-purpose registration algorithms, for their generality, accuracy and robustness. We fixed their parameters at values suggested by algorithm developers as reported in the literature. We tested them in 7 databases/tasks, which present one or more of 4 commonly-encountered challenges: 1) inter-subject anatomical variability in skull-stripped images; 2) intensity homogeneity, noise and large structural differences in raw images; 3) imaging protocol and field-of-view (FOV) differences in multi-site data; and 4) missing correspondences in pathology-bearing images. Totally 7,562 registrations were performed. Registration accuracies were measured by (multi-)expert-annotated landmarks or regions of interest (ROIs). To ensure reproducibility, we used public software tools, public databases (whenever possible), and we fully disclose the parameter settings. We show evaluation results, and discuss the performances in light of algorithms’ similarity metrics, transformation models and optimization strategies. We also discuss future directions for the algorithm development and evaluations. PMID:24951685

  13. Mammography status using patient self-reports and computerized radiology database.

    PubMed

    Thompson, B; Taylor, V; Goldberg, H; Mullen, M

    1999-10-01

    This study sought to compare self-reported mammography use of low-income women utilizing an inner-city public hospital with a computerized hospital database for tracking mammography use. A survey of all age-eligible women using the hospital's internal medicine clinic was done; responses were matched with the radiology database. We examined concordance among the two data sources. Concordance between self-report and the database was high (82%) when using "ever had a mammogram at the hospital," but low (58%) when comparing self-reported last mammogram with the information contained in the database. Disagreements existed between self-reports and the database. Because we sought to ensure that women would know exactly what a mammogram entailed by including a picture of a woman having a mammogram, it is possible that women's responses were accurate, leading to concerns that discrepancies might be present in the database. Physicians and staff must ensure that they understand the full history of a woman's experience with mammography before recommending for or against the procedure.

  14. Database constraints applied to metabolic pathway reconstruction tools.

    PubMed

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.

  15. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    PubMed Central

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  16. Using the Proteomics Identifications Database (PRIDE).

    PubMed

    Martens, Lennart; Jones, Phil; Côté, Richard

    2008-03-01

    The Proteomics Identifications Database (PRIDE) is a public data repository designed to store, disseminate, and analyze mass spectrometry based proteomics datasets. The PRIDE database can accommodate any level of detailed metadata about the submitted results, which can be queried, explored, viewed, or downloaded via the PRIDE Web interface. The PRIDE database also provides a simple, yet powerful, access control mechanism that fully supports confidential peer-reviewing of data related to a manuscript, ensuring that these results remain invisible to the general public while allowing referees and journal editors anonymized access to the data. This unit describes in detail the functionality that PRIDE provides with regards to searching, viewing, and comparing the available data, as well as different options for submitting data to PRIDE.

  17. Functional toxicogenomic assessment of triclosan in human ...

    EPA Pesticide Factsheets

    Thousands of chemicals for which limited toxicological data are available are used and then detected in humans and the environment. Rapid and cost-effective approaches for assessing the toxicological properties of chemicals are needed. We used CRISPR-Cas9 functional genomic screening to identify potential molecular mechanism of a widely used antimicrobial triclosan (TCS) in HepG2 cells. Resistant genes (whose knockout gives potential resistance) at IC50 (50% Inhibition concentration of cell viability) were significantly enriched in adherens junction pathway, MAPK signaling pathway and PPAR signaling pathway, suggesting a potential molecular mechanism in TCS induced cytotoxicity. Evaluation of top-ranked resistant genes, FTO (encoding an mRNA demethylase) and MAP2K3 (a MAP kinase kinase family gene), revealed that their loss conferred resistance to TCS. In contrast, sensitive genes (whose knockout enhances potential sensitivity) at IC10 and IC20 were specifically enriched in pathways involved with immune responses, which was concordant with the transcriptomic profiling of TCS at concentrations database. Overall, CRISPR-Cas9 functional genomic screening offers an alternative approach for chem

  18. Database Access Systems.

    ERIC Educational Resources Information Center

    Dalrymple, Prudence W.; Roderer, Nancy K.

    1994-01-01

    Highlights the changes that have occurred from 1987-93 in database access systems. Topics addressed include types of databases, including CD-ROMs; enduser interface; database selection; database access management, including library instruction and use of primary literature; economic issues; database users; the search process; and improving…

  19. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  20. Database Performance Monitoring for the Photovoltaic Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klise, Katherine A.

    The Database Performance Monitoring (DPM) software (copyright in processes) is being developed at Sandia National Laboratories to perform quality control analysis on time series data. The software loads time indexed databases (currently csv format), performs a series of quality control tests defined by the user, and creates reports which include summary statistics, tables, and graphics. DPM can be setup to run on an automated schedule defined by the user. For example, the software can be run once per day to analyze data collected on the previous day. HTML formatted reports can be sent via email or hosted on a website.more » To compare performance of several databases, summary statistics and graphics can be gathered in a dashboard view which links to detailed reporting information for each database. The software can be customized for specific applications.« less

  1. SAMMD: Staphylococcus aureus microarray meta-database.

    PubMed

    Nagarajan, Vijayaraj; Elasri, Mohamed O

    2007-10-02

    Staphylococcus aureus is an important human pathogen, causing a wide variety of diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is one of the leading causes of nosocomial infections. Its ability to resist multiple antibiotics poses a growing public health problem. In order to understand the mechanism of pathogenesis of S. aureus, several global expression profiles have been developed. These transcriptional profiles included regulatory mutants of S. aureus and growth of wild type under different growth conditions. The abundance of these profiles has generated a large amount of data without a uniform annotation system to comprehensively examine them. We report the development of the Staphylococcus aureus Microarray meta-database (SAMMD) which includes data from all the published transcriptional profiles. SAMMD is a web-accessible database that helps users to perform a variety of analysis against and within the existing transcriptional profiles. SAMMD is a relational database that uses MySQL as the back end and PHP/JavaScript/DHTML as the front end. The database is normalized and consists of five tables, which holds information about gene annotations, regulated gene lists, experimental details, references, and other details. SAMMD data is collected from the peer-reviewed published articles. Data extraction and conversion was done using perl scripts while data entry was done through phpMyAdmin tool. The database is accessible via a web interface that contains several features such as a simple search by ORF ID, gene name, gene product name, advanced search using gene lists, comparing among datasets, browsing, downloading, statistics, and help. The database is licensed under General Public License (GPL). SAMMD is hosted and available at http://www.bioinformatics.org/sammd/. Currently there are over 9500 entries for regulated genes, from 67 microarray experiments. SAMMD will help staphylococcal scientists to analyze their

  2. SAMMD: Staphylococcus aureus Microarray Meta-Database

    PubMed Central

    Nagarajan, Vijayaraj; Elasri, Mohamed O

    2007-01-01

    Background Staphylococcus aureus is an important human pathogen, causing a wide variety of diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is one of the leading causes of nosocomial infections. Its ability to resist multiple antibiotics poses a growing public health problem. In order to understand the mechanism of pathogenesis of S. aureus, several global expression profiles have been developed. These transcriptional profiles included regulatory mutants of S. aureus and growth of wild type under different growth conditions. The abundance of these profiles has generated a large amount of data without a uniform annotation system to comprehensively examine them. We report the development of the Staphylococcus aureus Microarray meta-database (SAMMD) which includes data from all the published transcriptional profiles. SAMMD is a web-accessible database that helps users to perform a variety of analysis against and within the existing transcriptional profiles. Description SAMMD is a relational database that uses MySQL as the back end and PHP/JavaScript/DHTML as the front end. The database is normalized and consists of five tables, which holds information about gene annotations, regulated gene lists, experimental details, references, and other details. SAMMD data is collected from the peer-reviewed published articles. Data extraction and conversion was done using perl scripts while data entry was done through phpMyAdmin tool. The database is accessible via a web interface that contains several features such as a simple search by ORF ID, gene name, gene product name, advanced search using gene lists, comparing among datasets, browsing, downloading, statistics, and help. The database is licensed under General Public License (GPL). Conclusion SAMMD is hosted and available at . Currently there are over 9500 entries for regulated genes, from 67 microarray experiments. SAMMD will help staphylococcal scientists to analyze their

  3. Pathogen Research Databases

    Science.gov Websites

    Hepatitis C Virus (HCV) database project is funded by the Division of Microbiology and Infectious Diseases of the National Institute of Allergies and Infectious Diseases (NIAID). The HCV database project started as a spin-off from the HIV database project. There are two databases for HCV, a sequence database

  4. Gramene database: navigating plant comparative genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  5. SmallSat Database

    NASA Technical Reports Server (NTRS)

    Petropulos, Dolores; Bittner, David; Murawski, Robert; Golden, Bert

    2015-01-01

    The SmallSat has an unrealized potential in both the private industry and in the federal government. Currently over 70 companies, 50 universities and 17 governmental agencies are involved in SmallSat research and development. In 1994, the U.S. Army Missile and Defense mapped the moon using smallSat imagery. Since then Smart Phones have introduced this imagery to the people of the world as diverse industries watched this trend. The deployment cost of smallSats is also greatly reduced compared to traditional satellites due to the fact that multiple units can be deployed in a single mission. Imaging payloads have become more sophisticated, smaller and lighter. In addition, the growth of small technology obtained from private industries has led to the more widespread use of smallSats. This includes greater revisit rates in imagery, significantly lower costs, the ability to update technology more frequently and the ability to decrease vulnerability of enemy attacks. The popularity of smallSats show a changing mentality in this fast paced world of tomorrow. What impact has this created on the NASA communication networks now and in future years? In this project, we are developing the SmallSat Relational Database which can support a simulation of smallSats within the NASA SCaN Compatability Environment for Networks and Integrated Communications (SCENIC) Modeling and Simulation Lab. The NASA Space Communications and Networks (SCaN) Program can use this modeling to project required network support needs in the next 10 to 15 years. The SmallSat Rational Database could model smallSats just as the other SCaN databases model the more traditional larger satellites, with a few exceptions. One being that the smallSat Database is designed to be built-to-order. The SmallSat database holds various hardware configurations that can be used to model a smallSat. It will require significant effort to develop as the research material can only be populated by hand to obtain the unique data

  6. Aptamer Database

    PubMed Central

    Lee, Jennifer F.; Hesselberth, Jay R.; Meyers, Lauren Ancel; Ellington, Andrew D.

    2004-01-01

    The aptamer database is designed to contain comprehensive sequence information on aptamers and unnatural ribozymes that have been generated by in vitro selection methods. Such data are not normally collected in ‘natural’ sequence databases, such as GenBank. Besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility, the database serves as a valuable resource for theoretical biologists who describe and explore fitness landscapes. The database is updated monthly and is publicly available at http://aptamer.icmb.utexas.edu/. PMID:14681367

  7. Solving Relational Database Problems with ORDBMS in an Advanced Database Course

    ERIC Educational Resources Information Center

    Wang, Ming

    2011-01-01

    This paper introduces how to use the object-relational database management system (ORDBMS) to solve relational database (RDB) problems in an advanced database course. The purpose of the paper is to provide a guideline for database instructors who desire to incorporate the ORDB technology in their traditional database courses. The paper presents…

  8. Generalized Database Management System Support for Numeric Database Environments.

    ERIC Educational Resources Information Center

    Dominick, Wayne D.; Weathers, Peggy G.

    1982-01-01

    This overview of potential for utilizing database management systems (DBMS) within numeric database environments highlights: (1) major features, functions, and characteristics of DBMS; (2) applicability to numeric database environment needs and user needs; (3) current applications of DBMS technology; and (4) research-oriented and…

  9. Addition of a breeding database in the Genome Database for Rosaceae

    PubMed Central

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  10. A database for spectral image quality

    NASA Astrophysics Data System (ADS)

    Le Moan, Steven; George, Sony; Pedersen, Marius; Blahová, Jana; Hardeberg, Jon Yngve

    2015-01-01

    We introduce a new image database dedicated to multi-/hyperspectral image quality assessment. A total of nine scenes representing pseudo-at surfaces of different materials (textile, wood, skin. . . ) were captured by means of a 160 band hyperspectral system with a spectral range between 410 and 1000nm. Five spectral distortions were designed, applied to the spectral images and subsequently compared in a psychometric experiment, in order to provide a basis for applications such as the evaluation of spectral image difference measures. The database can be downloaded freely from http://www.colourlab.no/cid.

  11. National Administrative Databases in Adult Spinal Deformity Surgery: A Cautionary Tale.

    PubMed

    Buckland, Aaron J; Poorman, Gregory; Freitag, Robert; Jalai, Cyrus; Klineberg, Eric O; Kelly, Michael; Passias, Peter G

    2017-08-15

    Comparison between national administrative databases and a prospective multicenter physician managed database. This study aims to assess the applicability of National Administrative Databases (NADs) in adult spinal deformity (ASD). Our hypothesis is that NADs do not include comparable patients as in a physician-managed database (PMD) for surgical outcomes in adult spinal deformity. NADs such as National Inpatient Sample (NIS) and National Surgical Quality Improvement Program (NSQIP) provide large numbers of publications owing to ease of data access and lack of IRB approval requirement. These databases utilize billing codes, not clinical inclusion criteria, and have not been validated against PMDs in ASD surgery. The NIS was searched for years 2002 to 2012 and NSQIP for years 2006 to 2013 using validated spinal deformity diagnostic codes. Procedural codes (ICD-9 and CPT) were then applied to each database. A multicenter PMD including years 2008 to 2015 was used for comparison. Databases were assessed for levels fused, osteotomies, decompressed levels, and invasiveness. Database comparisons for surgical details were made in all patients, and also for patients with ≥ 5 level spinal fusions. Approximately, 37,368 NIS, 1291 NSQIP, and 737 PMD patients were identified. NADs showed an increased use of deformity billing codes over the study period (NIS doubled, 68x NSQIP, P < 0.001), but ASD remained stable in the PMD.Surgical invasiveness, levels fused and use of 3-column osteotomy (3-CO) were significantly lower for all patients in the NIS (11.4-13.7) and NSQIP databases (6.4-12.7) compared with PMD (27.5-32.3). When limited to patients with ≥5 levels, invasiveness, levels fused, and use of 3-CO remained significantly higher in the PMD compared with NADs (P < 0.001). National databases NIS and NSQIP do not capture the same patient population as is captured in PMDs in ASD. Physicians should remain cautious in interpreting conclusions drawn from these databases

  12. CHRONIS: an animal chromosome image database.

    PubMed

    Toyabe, Shin-Ichi; Akazawa, Kouhei; Fukushi, Daisuke; Fukui, Kiichi; Ushiki, Tatsuo

    2005-01-01

    We have constructed a database system named CHRONIS (CHROmosome and Nano-Information System) to collect images of animal chromosomes and related nanotechnological information. CHRONIS enables rapid sharing of information on chromosome research among cell biologists and researchers in other fields via the Internet. CHRONIS is also intended to serve as a liaison tool for researchers who work in different centers. The image database contains more than 3,000 color microscopic images, including karyotypic images obtained from more than 1,000 species of animals. Researchers can browse the contents of the database using a usual World Wide Web interface in the following URL: http://chromosome.med.niigata-u.ac.jp/chronis/servlet/chronisservlet. The system enables users to input new images into the database, to locate images of interest by keyword searches, and to display the images with detailed information. CHRONIS has a wide range of applications, such as searching for appropriate probes for fluorescent in situ hybridization, comparing various kinds of microscopic images of a single species, and finding researchers working in the same field of interest.

  13. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  14. FIREMON Database

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON database software allows users to enter data, store, analyze, and summarize plot data, photos, and related documents. The FIREMON database software consists of a Java application and a Microsoft® Access database. The Java application provides the user interface with FIREMON data through data entry forms, data summary reports, and other data management tools...

  15. A prototypic small molecule database for bronchoalveolar lavage-based metabolomics

    NASA Astrophysics Data System (ADS)

    Walmsley, Scott; Cruickshank-Quinn, Charmion; Quinn, Kevin; Zhang, Xing; Petrache, Irina; Bowler, Russell P.; Reisdorph, Richard; Reisdorph, Nichole

    2018-04-01

    The analysis of bronchoalveolar lavage fluid (BALF) using mass spectrometry-based metabolomics can provide insight into lung diseases, such as asthma. However, the important step of compound identification is hindered by the lack of a small molecule database that is specific for BALF. Here we describe prototypic, small molecule databases derived from human BALF samples (n=117). Human BALF was extracted into lipid and aqueous fractions and analyzed using liquid chromatography mass spectrometry. Following filtering to reduce contaminants and artifacts, the resulting BALF databases (BALF-DBs) contain 11,736 lipid and 658 aqueous compounds. Over 10% of these were found in 100% of samples. Testing the BALF-DBs using nested test sets produced a 99% match rate for lipids and 47% match rate for aqueous molecules. Searching an independent dataset resulted in 45% matching to the lipid BALF-DB compared to<25% when general databases are searched. The BALF-DBs are available for download from MetaboLights. Overall, the BALF-DBs can reduce false positives and improve confidence in compound identification compared to when general databases are used.

  16. Domain Regeneration for Cross-Database Micro-Expression Recognition

    NASA Astrophysics Data System (ADS)

    Zong, Yuan; Zheng, Wenming; Huang, Xiaohua; Shi, Jingang; Cui, Zhen; Zhao, Guoying

    2018-05-01

    In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.

  17. Use of a German longitudinal prescription database (LRx) in pharmacoepidemiology.

    PubMed

    Richter, Hartmut; Dombrowski, Silvia; Hamer, Hajo; Hadji, Peyman; Kostev, Karel

    2015-01-01

    Large epidemiological databases are often used to examine matters pertaining to drug utilization, health services, and drug safety. The major strength of such databases is that they include large sample sizes, which allow precise estimates to be made. The IMS® LRx database has in recent years been used as a data source for epidemiological research. The aim of this paper is to review a number of recent studies published with the aid of this database and compare these with the results of similar studies using independent data published in the literature. In spite of being somewhat limited to studies for which comparative independent results were available, it was possible to include a wide range of possible uses of the LRx database in a variety of therapeutic fields: prevalence/incidence rate determination (diabetes, epilepsy), persistence analyses (diabetes, osteoporosis), use of comedication (diabetes), drug utilization (G-CSF market) and treatment costs (diabetes, G-CSF market). In general, the results of the LRx studies were found to be clearly in line with previously published reports. In some cases, noticeable discrepancies between the LRx results and the literature data were found (e.g. prevalence in epilepsy, persistence in osteoporosis) and these were discussed and possible reasons presented. Overall, it was concluded that the IMS® LRx database forms a suitable database for pharmacoepidemiological studies.

  18. An Algorithm for Building an Electronic Database.

    PubMed

    Cohen, Wess A; Gayle, Lloyd B; Patel, Nima P

    2016-01-01

    We propose an algorithm on how to create a prospectively maintained database, which can then be used to analyze prospective data in a retrospective fashion. Our algorithm provides future researchers a road map on how to set up, maintain, and use an electronic database to improve evidence-based care and future clinical outcomes. The database was created using Microsoft Access and included demographic information, socioeconomic information, and intraoperative and postoperative details via standardized drop-down menus. A printed out form from the Microsoft Access template was given to each surgeon to be completed after each case and a member of the health care team then entered the case information into the database. By utilizing straightforward, HIPAA-compliant data input fields, we permitted data collection and transcription to be easy and efficient. Collecting a wide variety of data allowed us the freedom to evolve our clinical interests, while the platform also permitted new categories to be added at will. We have proposed a reproducible method for institutions to create a database, which will then allow senior and junior surgeons to analyze their outcomes and compare them with others in an effort to improve patient care and outcomes. This is a cost-efficient way to create and maintain a database without additional software.

  19. Compartmental and Data-Based Modeling of Cerebral Hemodynamics: Linear Analysis.

    PubMed

    Henley, B C; Shin, D C; Zhang, R; Marmarelis, V Z

    Compartmental and data-based modeling of cerebral hemodynamics are alternative approaches that utilize distinct model forms and have been employed in the quantitative study of cerebral hemodynamics. This paper examines the relation between a compartmental equivalent-circuit and a data-based input-output model of dynamic cerebral autoregulation (DCA) and CO2-vasomotor reactivity (DVR). The compartmental model is constructed as an equivalent-circuit utilizing putative first principles and previously proposed hypothesis-based models. The linear input-output dynamics of this compartmental model are compared with data-based estimates of the DCA-DVR process. This comparative study indicates that there are some qualitative similarities between the two-input compartmental model and experimental results.

  20. PrimateLit Database

    Science.gov Websites

    Primate Info Net Related Databases NCRR PrimateLit: A bibliographic database for primatology Top of any problems with this service. We welcome your feedback. The PrimateLit database is no longer being Resources, National Institutes of Health. The database is a collaborative project of the Wisconsin Primate

  1. The CATDAT damaging earthquakes database

    NASA Astrophysics Data System (ADS)

    Daniell, J. E.; Khazai, B.; Wenzel, F.; Vervaeck, A.

    2011-08-01

    The global CATDAT damaging earthquakes and secondary effects (tsunami, fire, landslides, liquefaction and fault rupture) database was developed to validate, remove discrepancies, and expand greatly upon existing global databases; and to better understand the trends in vulnerability, exposure, and possible future impacts of such historic earthquakes. Lack of consistency and errors in other earthquake loss databases frequently cited and used in analyses was a major shortcoming in the view of the authors which needed to be improved upon. Over 17 000 sources of information have been utilised, primarily in the last few years, to present data from over 12 200 damaging earthquakes historically, with over 7000 earthquakes since 1900 examined and validated before insertion into the database. Each validated earthquake includes seismological information, building damage, ranges of social losses to account for varying sources (deaths, injuries, homeless, and affected), and economic losses (direct, indirect, aid, and insured). Globally, a slightly increasing trend in economic damage due to earthquakes is not consistent with the greatly increasing exposure. The 1923 Great Kanto (214 billion USD damage; 2011 HNDECI-adjusted dollars) compared to the 2011 Tohoku (>300 billion USD at time of writing), 2008 Sichuan and 1995 Kobe earthquakes show the increasing concern for economic loss in urban areas as the trend should be expected to increase. Many economic and social loss values not reported in existing databases have been collected. Historical GDP (Gross Domestic Product), exchange rate, wage information, population, HDI (Human Development Index), and insurance information have been collected globally to form comparisons. This catalogue is the largest known cross-checked global historic damaging earthquake database and should have far-reaching consequences for earthquake loss estimation, socio-economic analysis, and the global reinsurance field.

  2. Molecular Identification and Databases in Fusarium

    USDA-ARS?s Scientific Manuscript database

    DNA sequence-based methods for identifying pathogenic and mycotoxigenic Fusarium isolates have become the gold standard worldwide. Moreover, fusarial DNA sequence data are increasing rapidly in several web-accessible databases for comparative purposes. Unfortunately, the use of Basic Alignment Sea...

  3. Database extraction strategies for low-template evidence.

    PubMed

    Bleka, Øyvind; Dørum, Guro; Haned, Hinda; Gill, Peter

    2014-03-01

    Often in forensic cases, the profile of at least one of the contributors to a DNA evidence sample is unknown and a database search is needed to discover possible perpetrators. In this article we consider two types of search strategies to extract suspects from a database using methods based on probability arguments. The performance of the proposed match scores is demonstrated by carrying out a study of each match score relative to the level of allele drop-out in the crime sample, simulating low-template DNA. The efficiency was measured by random man simulation and we compared the performance using the SGM Plus kit and the ESX 17 kit for the Norwegian population, demonstrating that the latter has greatly enhanced power to discover perpetrators of crime in large national DNA databases. The code for the database extraction strategies will be prepared for release in the R-package forensim. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Validating abortion procedure coding in Canadian administrative databases.

    PubMed

    Samiedaluie, Saied; Peterson, Sandra; Brant, Rollin; Kaczorowski, Janusz; Norman, Wendy V

    2016-07-12

    The British Columbia (BC) Ministry of Health collects abortion procedure data in the Medical Services Plan (MSP) physician billings database and in the hospital information Discharge Abstracts Database (DAD). Our study seeks to validate abortion procedure coding in these databases. Two randomized controlled trials enrolled a cohort of 1031 women undergoing abortion. The researcher collected database includes both enrollment and follow up chart review data. The study cohort was linked to MSP and DAD data to identify all abortions events captured in the administrative databases. We compared clinical chart data on abortion procedures with health administrative data. We considered a match to occur if an abortion related code was found in administrative data within 30 days of the date of the same event documented in a clinical chart. Among 1158 abortion events performed during enrollment and follow-up period, 99.1 % were found in at least one of the administrative data sources. The sensitivities for the two databases, evaluated using a gold standard, were 97.7 % (95 % confidence interval (CI): 96.6-98.5) for the MSP database and 91.9 % (95 % CI: 90.0-93.4) for the DAD. Abortion events coded in the BC health administrative databases are highly accurate. Single-payer health administrative databases at the provincial level in Canada have the potential to offer valid data reflecting abortion events. ClinicalTrials.gov Identifier NCT01174225 , Current Controlled Trials ISRCTN19506752 .

  5. Genome databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts inmore » the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.« less

  6. DataBase on Demand

    NASA Astrophysics Data System (ADS)

    Gaspar Aparicio, R.; Gomez, D.; Coterillo Coz, I.; Wojcik, D.

    2012-12-01

    At CERN a number of key database applications are running on user-managed MySQL database services. The database on demand project was born out of an idea to provide the CERN user community with an environment to develop and run database services outside of the actual centralised Oracle based database services. The Database on Demand (DBoD) empowers the user to perform certain actions that had been traditionally done by database administrators, DBA's, providing an enterprise platform for database applications. It also allows the CERN user community to run different database engines, e.g. presently open community version of MySQL and single instance Oracle database server. This article describes a technology approach to face this challenge, a service level agreement, the SLA that the project provides, and an evolution of possible scenarios.

  7. Using a Semi-Realistic Database to Support a Database Course

    ERIC Educational Resources Information Center

    Yue, Kwok-Bun

    2013-01-01

    A common problem for university relational database courses is to construct effective databases for instructions and assignments. Highly simplified "toy" databases are easily available for teaching, learning, and practicing. However, they do not reflect the complexity and practical considerations that students encounter in real-world…

  8. NATIVE HEALTH DATABASES: NATIVE HEALTH RESEARCH DATABASE (NHRD)

    EPA Science Inventory

    The Native Health Databases contain bibliographic information and abstracts of health-related articles, reports, surveys, and other resource documents pertaining to the health and health care of American Indians, Alaska Natives, and Canadian First Nations. The databases provide i...

  9. Object-oriented structures supporting remote sensing databases

    NASA Technical Reports Server (NTRS)

    Wichmann, Keith; Cromp, Robert F.

    1995-01-01

    Object-oriented databases show promise for modeling the complex interrelationships pervasive in scientific domains. To examine the utility of this approach, we have developed an Intelligent Information Fusion System based on this technology, and applied it to the problem of managing an active repository of remotely-sensed satellite scenes. The design and implementation of the system is compared and contrasted with conventional relational database techniques, followed by a presentation of the underlying object-oriented data structures used to enable fast indexing into the data holdings.

  10. Food composition database development for between country comparisons.

    PubMed

    Merchant, Anwar T; Dehghan, Mahshid

    2006-01-19

    Nutritional assessment by diet analysis is a two-stepped process consisting of evaluation of food consumption, and conversion of food into nutrient intake by using a food composition database, which lists the mean nutritional values for a given food portion. Most reports in the literature focus on minimizing errors in estimation of food consumption but the selection of a specific food composition table used in nutrient estimation is also a source of errors. We are conducting a large prospective study internationally and need to compare diet, assessed by food frequency questionnaires, in a comparable manner between different countries. We have prepared a multi-country food composition database for nutrient estimation in all the countries participating in our study. The nutrient database is primarily based on the USDA food composition database, modified appropriately with reference to local food composition tables, and supplemented with recipes of locally eaten mixed dishes. By doing so we have ensured that the units of measurement, method of selection of foods for testing, and assays used for nutrient estimation are consistent and as current as possible, and yet have taken into account some local variations. Using this common metric for nutrient assessment will reduce differential errors in nutrient estimation and improve the validity of between-country comparisons.

  11. Network-based statistical comparison of citation topology of bibliographic databases

    PubMed Central

    Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko

    2014-01-01

    Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies. PMID:25263231

  12. The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

    PubMed

    Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

    2015-01-01

    The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  13. The CSB Incident Screening Database: description, summary statistics and uses.

    PubMed

    Gomez, Manuel R; Casper, Susan; Smith, E Allen

    2008-11-15

    This paper briefly describes the Chemical Incident Screening Database currently used by the CSB to identify and evaluate chemical incidents for possible investigations, and summarizes descriptive statistics from this database that can potentially help to estimate the number, character, and consequences of chemical incidents in the US. The report compares some of the information in the CSB database to roughly similar information available from databases operated by EPA and the Agency for Toxic Substances and Disease Registry (ATSDR), and explores the possible implications of these comparisons with regard to the dimension of the chemical incident problem. Finally, the report explores in a preliminary way whether a system modeled after the existing CSB screening database could be developed to serve as a national surveillance tool for chemical incidents.

  14. Energy Consumption Database

    Science.gov Websites

    Consumption Database The California Energy Commission has created this on-line database for informal reporting ) classifications. The database also provides easy downloading of energy consumption data into Microsoft Excel (XLSX

  15. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  16. Identifying work-related motor vehicle crashes in multiple databases.

    PubMed

    Thomas, Andrea M; Thygerson, Steven M; Merrill, Ray M; Cook, Lawrence J

    2012-01-01

    To compare and estimate the magnitude of work-related motor vehicle crashes in Utah using 2 probabilistically linked statewide databases. Data from 2006 and 2007 motor vehicle crash and hospital databases were joined through probabilistic linkage. Summary statistics and capture-recapture were used to describe occupants injured in work-related motor vehicle crashes and estimate the size of this population. There were 1597 occupants in the motor vehicle crash database and 1673 patients in the hospital database identified as being in a work-related motor vehicle crash. We identified 1443 occupants with at least one record from either the motor vehicle crash or hospital database indicating work-relatedness that linked to any record in the opposing database. We found that 38.7 percent of occupants injured in work-related motor vehicle crashes identified in the motor vehicle crash database did not have a primary payer code of workers' compensation in the hospital database and 40.0 percent of patients injured in work-related motor vehicle crashes identified in the hospital database did not meet our definition of a work-related motor vehicle crash in the motor vehicle crash database. Depending on how occupants injured in work-related motor crashes are identified, we estimate the population to be between 1852 and 8492 in Utah for the years 2006 and 2007. Research on single databases may lead to biased interpretations of work-related motor vehicle crashes. Combining 2 population based databases may still result in an underestimate of the magnitude of work-related motor vehicle crashes. Improved coding of work-related incidents is needed in current databases.

  17. Toxicogenomics analysis of mouse lung responses following exposure to titanium dioxide nanomaterials reveal their disease potential at high doses

    PubMed Central

    Rahman, Luna; Wu, Dongmei; Johnston, Michael; William, Andrew; Halappanavar, Sabina

    2017-01-01

    Titanium dioxide nanoparticles (TiO2NPs) induce lung inflammation in experimental animals. In this study, we conducted a comprehensive toxicogenomic analysis of lung responses in mice exposed to six individual TiO2NPs exhibiting different sizes (8, 20 and 300nm), crystalline structure (anatase, rutile or anatase/rutile) and surface modifications (hydrophobic or hydrophilic) to investigate whether the mechanisms leading to TiO2NP-induced lung inflammation are property specific. A detailed histopathological analysis was conducted to investigate the long-term disease implications of acute exposure to TiO2NPs. C57BL/6 mice were exposed to 18, 54, 162 or 486 µg of TiO2NPs/mouse via single intratracheal instillation. Controls were exposed to dispersion medium only. Bronchoalveolar lavage fluid (BALF) and lung tissue were sampled on 1, 28 and 90 days post-exposure. Although all TiO2NPs induced lung inflammation as measured by the neutrophil influx in BALF, rutile-type TiO2NPs induced higher inflammation with the hydrophilic rutile TiO2NP showing the maximum increase. Accordingly, the rutile TiO2NPs induced higher number of differentially expressed genes. Histopathological analysis of lung sections on Day 90 post-exposure showed increased collagen staining and fibrosis-like changes following exposure to the rutile TiO2NPs at the highest dose tested. Among the anatase, the smallest TiO2NP of 8nm showed the maximum response. The anatase TiO2NP of 300nm was the least responsive of all. The results suggest that the severity of lung inflammation is property specific; however, the underlying mechanisms (genes and pathways perturbed) leading to inflammation were the same for all particle types. While the particle size clearly influenced the overall acute lung responses, a combination of small size, crystalline structure and hydrophilic surface contributed to the long-term pathological effects observed at the highest dose (486 µg/mouse). Although the dose at which the

  18. Brassica ASTRA: an integrated database for Brassica genomic research.

    PubMed

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  19. Exploring Discretization Error in Simulation-Based Aerodynamic Databases

    NASA Technical Reports Server (NTRS)

    Aftosmis, Michael J.; Nemec, Marian

    2010-01-01

    This work examines the level of discretization error in simulation-based aerodynamic databases and introduces strategies for error control. Simulations are performed using a parallel, multi-level Euler solver on embedded-boundary Cartesian meshes. Discretization errors in user-selected outputs are estimated using the method of adjoint-weighted residuals and we use adaptive mesh refinement to reduce these errors to specified tolerances. Using this framework, we examine the behavior of discretization error throughout a token database computed for a NACA 0012 airfoil consisting of 120 cases. We compare the cost and accuracy of two approaches for aerodynamic database generation. In the first approach, mesh adaptation is used to compute all cases in the database to a prescribed level of accuracy. The second approach conducts all simulations using the same computational mesh without adaptation. We quantitatively assess the error landscape and computational costs in both databases. This investigation highlights sensitivities of the database under a variety of conditions. The presence of transonic shocks or the stiffness in the governing equations near the incompressible limit are shown to dramatically increase discretization error requiring additional mesh resolution to control. Results show that such pathologies lead to error levels that vary by over factor of 40 when using a fixed mesh throughout the database. Alternatively, controlling this sensitivity through mesh adaptation leads to mesh sizes which span two orders of magnitude. We propose strategies to minimize simulation cost in sensitive regions and discuss the role of error-estimation in database quality.

  20. RPG: the Ribosomal Protein Gene database.

    PubMed

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes.

  1. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  2. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  3. Teaching Case: Adapting the Access Northwind Database to Support a Database Course

    ERIC Educational Resources Information Center

    Dyer, John N.; Rogers, Camille

    2015-01-01

    A common problem encountered when teaching database courses is that few large illustrative databases exist to support teaching and learning. Most database textbooks have small "toy" databases that are chapter objective specific, and thus do not support application over the complete domain of design, implementation and management concepts…

  4. Potential use of routine databases in health technology assessment.

    PubMed

    Raftery, J; Roderick, P; Stevens, A

    2005-05-01

    event reporting, confidential enquiries, disease-only registers and health surveys. Databases in group I can be used not only to assess effectiveness but also to assess diffusion and equity. Databases in group II can only assess diffusion. Group III has restricted scope for assessing HTs, except for analysis of adverse events. For use in costing, databases need to include unit costs or prices. Some databases included unit cost as well as a specific HT. A list of around 270 databases was identified at the level of UK, England and Wales or England (over 1000 including Scotland, Wales and Northern Ireland). Allocation of these to the above groups identified around 60 databases with some potential for HT assessment, roughly half to group I. Eighteen clinical registers were identified as having the greatest potential although the clinical administrative datasets had potential mainly owing to their inclusion of a wide range of technologies. Only two databases were identified that could directly be used in costing. The review of the potential capture of HTs prioritized by the UK's NHS R&D HTA programme showed that only 10% would be captured in these databases, mainly drugs prescribed in primary care. The review of the use of routine databases in any form of HT assessment indicated that clinical registers were mainly used for national comparative audit. Some databases have only been used in annual reports, usually time trend analysis. A few peer-reviewed papers used a clinical register to assess the effectiveness of a technology. Accessibility is suggested as a barrier to using most databases. Clinical administrative databases (group Ib) have mainly been used to build population needs indices and performance indicators. A review of the validity of used databases showed that although internal consistency checks were common, relatively few had any form of external audit. Some comparative audit databases have data scrutinised by participating units. Issues around coverage and

  5. Applications of GIS and database technologies to manage a Karst Feature Database

    USGS Publications Warehouse

    Gao, Y.; Tipping, R.G.; Alexander, E.C.

    2006-01-01

    This paper describes the management of a Karst Feature Database (KFD) in Minnesota. Two sets of applications in both GIS and Database Management System (DBMS) have been developed for the KFD of Minnesota. These applications were used to manage and to enhance the usability of the KFD. Structured Query Language (SQL) was used to manipulate transactions of the database and to facilitate the functionality of the user interfaces. The Database Administrator (DBA) authorized users with different access permissions to enhance the security of the database. Database consistency and recovery are accomplished by creating data logs and maintaining backups on a regular basis. The working database provides guidelines and management tools for future studies of karst features in Minnesota. The methodology of designing this DBMS is applicable to develop GIS-based databases to analyze and manage geomorphic and hydrologic datasets at both regional and local scales. The short-term goal of this research is to develop a regional KFD for the Upper Mississippi Valley Karst and the long-term goal is to expand this database to manage and study karst features at national and global scales.

  6. Comparability of methods assigning monetary costs to diets: derivation from household till receipts versus cost database estimation using 4-day food diaries.

    PubMed

    Timmins, K A; Morris, M A; Hulme, C; Edwards, K L; Clarke, G P; Cade, J E

    2013-10-01

    Diet cost could influence dietary patterns, with potential health consequences. Assigning a monetary cost to diet is challenging, and there are contrasting methods in the literature. This study compares two methods-a food cost database linked to 4-day diet diaries and an individual cost calculated from household till receipts. The Diet and Nutrition Tool for Evaluation (DANTE) had supermarket prices (cost per 100 g) added to its food composition table. Agreement between diet costs calculated using DANTE from food diaries and expenditure recorded using food purchase till receipts for 325 individuals was assessed using correlation and Bland Altman (BA) plots. The mean difference between the methods' estimates was £0.10. The BA showed 95% limits of agreement of £2.88 and -£3.08. Excluding the highest 5% of diet cost values from each collection method reduced the mean difference to £0.02, with limits of agreement ranging from £2.31 to -£2.35. Agreement between the methods was stronger for males and for adults. Diet cost estimates using a food price database with 4-day food diaries are comparable to recorded expenditure from household till receipts at the population or group level. At the individual level, however, estimates differed by as much as £3.00 per day. The methods agreed less when estimating diet costs of children, females or those with more expensive diets.

  7. Open Geoscience Database

    NASA Astrophysics Data System (ADS)

    Bashev, A.

    2012-04-01

    Currently there is an enormous amount of various geoscience databases. Unfortunately the only users of the majority of the databases are their elaborators. There are several reasons for that: incompaitability, specificity of tasks and objects and so on. However the main obstacles for wide usage of geoscience databases are complexity for elaborators and complication for users. The complexity of architecture leads to high costs that block the public access. The complication prevents users from understanding when and how to use the database. Only databases, associated with GoogleMaps don't have these drawbacks, but they could be hardly named "geoscience" Nevertheless, open and simple geoscience database is necessary at least for educational purposes (see our abstract for ESSI20/EOS12). We developed a database and web interface to work with them and now it is accessible at maps.sch192.ru. In this database a result is a value of a parameter (no matter which) in a station with a certain position, associated with metadata: the date when the result was obtained; the type of a station (lake, soil etc); the contributor that sent the result. Each contributor has its own profile, that allows to estimate the reliability of the data. The results can be represented on GoogleMaps space image as a point in a certain position, coloured according to the value of the parameter. There are default colour scales and each registered user can create the own scale. The results can be also extracted in *.csv file. For both types of representation one could select the data by date, object type, parameter type, area and contributor. The data are uploaded in *.csv format: Name of the station; Lattitude(dd.dddddd); Longitude(ddd.dddddd); Station type; Parameter type; Parameter value; Date(yyyy-mm-dd). The contributor is recognised while entering. This is the minimal set of features that is required to connect a value of a parameter with a position and see the results. All the complicated data

  8. The MAR databases: development and implementation of databases specific for marine metagenomics

    PubMed Central

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen

    2018-01-01

    Abstract We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. PMID:29106641

  9. Orthographic and Phonological Neighborhood Databases across Multiple Languages.

    PubMed

    Marian, Viorica

    2017-01-01

    The increased globalization of science and technology and the growing number of bilinguals and multilinguals in the world have made research with multiple languages a mainstay for scholars who study human function and especially those who focus on language, cognition, and the brain. Such research can benefit from large-scale databases and online resources that describe and measure lexical, phonological, orthographic, and semantic information. The present paper discusses currently-available resources and underscores the need for tools that enable measurements both within and across multiple languages. A general review of language databases is followed by a targeted introduction to databases of orthographic and phonological neighborhoods. A specific focus on CLEARPOND illustrates how databases can be used to assess and compare neighborhood information across languages, to develop research materials, and to provide insight into broad questions about language. As an example of how using large-scale databases can answer questions about language, a closer look at neighborhood effects on lexical access reveals that not only orthographic, but also phonological neighborhoods can influence visual lexical access both within and across languages. We conclude that capitalizing upon large-scale linguistic databases can advance, refine, and accelerate scientific discoveries about the human linguistic capacity.

  10. Evaluation of low doses BPA-induced perturbation of glycemia by toxicogenomics points to a primary role of pancreatic islets and to the mechanism of toxicity.

    PubMed

    Carchia, E; Porreca, I; Almeida, P J; D'Angelo, F; Cuomo, D; Ceccarelli, M; De Felice, M; Mallardo, M; Ambrosino, C

    2015-10-29

    Epidemiologic and experimental studies have associated changes of blood glucose homeostasis to Bisphenol A (BPA) exposure. We took a toxicogenomic approach to investigate the mechanisms of low-dose (1 × 10(-9 )M) BPA toxicity in ex vivo cultures of primary murine pancreatic islets and hepatocytes. Twenty-nine inhibited genes were identified in islets and none in exposed hepatocytes. Although their expression was slightly altered, their impaired cellular level, as a whole, resulted in specific phenotypic changes. Damage of mitochondrial function and metabolism, as predicted by bioinformatics analyses, was observed: BPA exposure led to a time-dependent decrease in mitochondrial membrane potential, to an increase of ROS cellular levels and, finally, to an induction of apoptosis, attributable to the bigger Bax/Bcl-2 ratio owing to activation of NF-κB pathway. Our data suggest a multifactorial mechanism for BPA toxicity in pancreatic islets with emphasis to mitochondria dysfunction and NF-κB activation. Finally, we assessed in vitro the viability of BPA-treated islets in stressing condition, as exposure to high glucose, evidencing a reduced ability of the exposed islets to respond to further damages. The result was confirmed in vivo evaluating the reduction of glycemia in hyperglycemic mice transplanted with control and BPA-treated pancreatic islets. The reported findings identify the pancreatic islet as the main target of BPA toxicity in impairing the glycemia. They suggest that the BPA exposure can weaken the response of the pancreatic islets to damages. The last observation could represent a broader concept whose consideration should lead to the development of experimental plans better reproducing the multiple exposure conditions.

  11. Penile prosthesis implantation compares favorably in malpractice outcomes to other common urological procedures: findings from a malpractice insurance database.

    PubMed

    Chason, Juddson; Sausville, Justin; Kramer, Andrew C

    2009-08-01

    Some urologists choose not to offer penile prostheses because of concern over malpractice liability. The aim of this study was to assess whether urologists performing penile prosthesis surgery are placed at a greater malpractice risk. Percentage of malpractice suits from prosthesis surgery and other urological procedures that result in payment, average resulting payout from these cases, and category of legal issue that ultimately resulted in payout. A database from the Physician Insurers Association of America, an association of malpractice insurance companies covering physicians in North America, was analyzed to quantitatively compare penile implant surgery to other urological procedures in medicolegal terms. Compared to other common urological procedures, penile implant is comparable and on the lower end of the spectrum in terms of both the percentage of malpractice suits that result in payment and the amount ultimately paid in indemnity from those cases. Additionally, issues of informed consent play the largest role in indemnities for all urological procedures, whereas surgical technique is the most important issue for prosthesis surgery. Urologists who are adequately trained in prosthetic surgery should not avoid penile implant procedures for fear of malpractice suits. A focus on communication and informed consent can greatly reduce malpractice risk for urological procedures.

  12. A novel approach for incremental uncertainty rule generation from databases with missing values handling: application to dynamic medical databases.

    PubMed

    Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos

    2005-09-01

    Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.

  13. Impact of database quality in knowledge-based treatment planning for prostate cancer.

    PubMed

    Wall, Phillip D H; Carver, Robert L; Fontenot, Jonas D

    2018-03-13

    This article investigates dose-volume prediction improvements in a common knowledge-based planning (KBP) method using a Pareto plan database compared with using a conventional, clinical plan database. Two plan databases were created using retrospective, anonymized data of 124 volumetric modulated arc therapy (VMAT) prostate cancer patients. The clinical plan database (CPD) contained planning data from each patient's clinically treated VMAT plan, which were manually optimized by various planners. The multicriteria optimization database (MCOD) contained Pareto-optimal plan data from VMAT plans created using a standardized multicriteria optimization protocol. Overlap volume histograms, incorporating fractional organ at risk volumes only within the treatment fields, were computed for each patient and used to match new patient anatomy to similar database patients. For each database patient, CPD and MCOD KBP predictions were generated for D 10 , D 30 , D 50 , D 65 , and D 80 of the bladder and rectum in a leave-one-out manner. Prediction achievability was evaluated through a replanning study on a subset of 31 randomly selected database patients using the best KBP predictions, regardless of plan database origin, as planning goals. MCOD predictions were significantly lower than CPD predictions for all 5 bladder dose-volumes and rectum D 50 (P = .004) and D 65 (P < .001), whereas CPD predictions for rectum D 10 (P = .005) and D 30 (P < .001) were significantly less than MCOD predictions. KBP predictions were statistically achievable in the replans for all predicted dose-volumes, excluding D 10 of bladder (P = .03) and rectum (P = .04). Compared with clinical plans, replans showed significant average reductions in D mean for bladder (7.8 Gy; P < .001) and rectum (9.4 Gy; P < .001), while maintaining statistically similar planning target volume, femoral head, and penile bulb dose. KBP dose-volume predictions derived from Pareto plans were more optimal overall than those

  14. Image database for digital hand atlas

    NASA Astrophysics Data System (ADS)

    Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente; Dey, Partha S.; Gertych, Arkadiusz; Pospiech-Kurkowska, Sywia

    2003-05-01

    Bone age assessment is a procedure frequently performed in pediatric patients to evaluate their growth disorder. A commonly used method is atlas matching by a visual comparison of a hand radiograph with a small reference set of old Greulich-Pyle atlas. We have developed a new digital hand atlas with a large set of clinically normal hand images of diverse ethnic groups. In this paper, we will present our system design and implementation of the digital atlas database to support the computer-aided atlas matching for bone age assessment. The system consists of a hand atlas image database, a computer-aided diagnostic (CAD) software module for image processing and atlas matching, and a Web user interface. Users can use a Web browser to push DICOM images, directly or indirectly from PACS, to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, are then extracted and compared with patterns from the atlas image database to assess the bone age. The digital atlas method built on a large image database and current Internet technology provides an alternative to supplement or replace the traditional one for a quantitative, accurate and cost-effective assessment of bone age.

  15. The MAR databases: development and implementation of databases specific for marine metagenomics.

    PubMed

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

    2018-01-04

    We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Databases for LDEF results

    NASA Technical Reports Server (NTRS)

    Bohnhoff-Hlavacek, Gail

    1992-01-01

    One of the objectives of the team supporting the LDEF Systems and Materials Special Investigative Groups is to develop databases of experimental findings. These databases identify the hardware flown, summarize results and conclusions, and provide a system for acknowledging investigators, tracing sources of data, and future design suggestions. To date, databases covering the optical experiments, and thermal control materials (chromic acid anodized aluminum, silverized Teflon blankets, and paints) have been developed at Boeing. We used the Filemaker Pro software, the database manager for the Macintosh computer produced by the Claris Corporation. It is a flat, text-retrievable database that provides access to the data via an intuitive user interface, without tedious programming. Though this software is available only for the Macintosh computer at this time, copies of the databases can be saved to a format that is readable on a personal computer as well. Further, the data can be exported to more powerful relational databases, capabilities, and use of the LDEF databases and describe how to get copies of the database for your own research.

  17. Low Cost Comprehensive Microcomputer-Based Medical History Database Acquisition

    PubMed Central

    Buchan, Robert R. C.

    1980-01-01

    A carefully detailed, comprehensive medical history database is the fundamental essence of patient-physician interaction. Computer generated medical history acquisition has repeatedly been shown to be highly acceptable to both patient and physician while consistantly providing a superior product. Cost justification of machine derived problem and history databases, however, has in the past been marginal, at best. Routine use of the technology has therefore been limited to large clinics, university hospitals and federal installations where feasible volume applications are supported by endowment, research funds or taxes. This paper summarizes the use of a unique low cost device which marries advanced microprocessor technology with random access, variable-frame film projection techniques to acquire a detailed comprehensive medical history database. Preliminary data are presented which compare patient, physician, and machine generated histories for content, discovery, compliance and acceptability. Results compare favorably with the findings in similar studies by a variety of authors. ImagesFigure 1Figure 2Figure 3Figure 4

  18. Nonparametric Bayesian Modeling for Automated Database Schema Matching

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ferragut, Erik M; Laska, Jason A

    2015-01-01

    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.

  19. Databases for Microbiologists

    DOE PAGES

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  20. Databases for Microbiologists

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhulin, Igor B.

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  1. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  2. Thematic video indexing to support video database retrieval and query processing

    NASA Astrophysics Data System (ADS)

    Khoja, Shakeel A.; Hall, Wendy

    1999-08-01

    This paper presents a novel video database system, which caters for complex and long videos, such as documentaries, educational videos, etc. As compared to relatively structured format videos like CNN news or commercial advertisements, this database system has the capacity to work with long and unstructured videos.

  3. A statistical walk through the IAU MDC database

    NASA Astrophysics Data System (ADS)

    Andreić, Željko; Šegon, Damir; Vida, Denis

    2014-02-01

    The IAU MDC database is an important tool for the study of meteor showers. Though the history, the amount of data in the database for particular showers, and also their extent, varied significantly. Thus, a systematic check of the current database (as of 1st of June, 2014) was performed, and the results are reported and discussed in this paper. The most obvious one is that the database contains showers for which only basic radiant data are available, showers for which a full set of radiant and orbital data is provided, and showers with data span anywhere in between. As a lot of current work on meteor showers involves D-criteria for orbital similarity, this automatically excludes showers without the orbital data from such work. A test run to compare showers only by their radiant data was performed, and was found to be inadequate in testing for shower similarities. A few inconsistencies and typographic errors were found and are briefly described here.

  4. RPG: the Ribosomal Protein Gene database

    PubMed Central

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes. PMID:14681386

  5. Online Databases in Physics.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; Verbeck, Alison F.

    1984-01-01

    This overview of 47 online sources for physics information available in the United States--including sub-field databases, transdisciplinary databases, and multidisciplinary databases-- notes content, print source, language, time coverage, and databank. Two discipline-specific databases (SPIN and PHYSICS BRIEFS) are also discussed. (EJS)

  6. Identifying Psoriasis and Psoriatic Arthritis Patients in Retrospective Databases When Diagnosis Codes Are Not Available: A Validation Study Comparing Medication/Prescriber Visit-Based Algorithms with Diagnosis Codes.

    PubMed

    Dobson-Belaire, Wendy; Goodfield, Jason; Borrelli, Richard; Liu, Fei Fei; Khan, Zeba M

    2018-01-01

    Using diagnosis code-based algorithms is the primary method of identifying patient cohorts for retrospective studies; nevertheless, many databases lack reliable diagnosis code information. To develop precise algorithms based on medication claims/prescriber visits (MCs/PVs) to identify psoriasis (PsO) patients and psoriatic patients with arthritic conditions (PsO-AC), a proxy for psoriatic arthritis, in Canadian databases lacking diagnosis codes. Algorithms were developed using medications with narrow indication profiles in combination with prescriber specialty to define PsO and PsO-AC. For a 3-year study period from July 1, 2009, algorithms were validated using the PharMetrics Plus database, which contains both adjudicated medication claims and diagnosis codes. Positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity of the developed algorithms were assessed using diagnosis code as the reference standard. Chosen algorithms were then applied to Canadian drug databases to profile the algorithm-identified PsO and PsO-AC cohorts. In the selected database, 183,328 patients were identified for validation. The highest PPVs for PsO (85%) and PsO-AC (65%) occurred when a predictive algorithm of two or more MCs/PVs was compared with the reference standard of one or more diagnosis codes. NPV and specificity were high (99%-100%), whereas sensitivity was low (≤30%). Reducing the number of MCs/PVs or increasing diagnosis claims decreased the algorithms' PPVs. We have developed an MC/PV-based algorithm to identify PsO patients with a high degree of accuracy, but accuracy for PsO-AC requires further investigation. Such methods allow researchers to conduct retrospective studies in databases in which diagnosis codes are absent. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  7. Development of a personalized training system using the Lung Image Database Consortium and Image Database resource Initiative Database.

    PubMed

    Lin, Hongli; Wang, Weisheng; Luo, Jiawei; Yang, Xuedong

    2014-12-01

    The aim of this study was to develop a personalized training system using the Lung Image Database Consortium (LIDC) and Image Database resource Initiative (IDRI) Database, because collecting, annotating, and marking a large number of appropriate computed tomography (CT) scans, and providing the capability of dynamically selecting suitable training cases based on the performance levels of trainees and the characteristics of cases are critical for developing a efficient training system. A novel approach is proposed to develop a personalized radiology training system for the interpretation of lung nodules in CT scans using the Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) database, which provides a Content-Boosted Collaborative Filtering (CBCF) algorithm for predicting the difficulty level of each case of each trainee when selecting suitable cases to meet individual needs, and a diagnostic simulation tool to enable trainees to analyze and diagnose lung nodules with the help of an image processing tool and a nodule retrieval tool. Preliminary evaluation of the system shows that developing a personalized training system for interpretation of lung nodules is needed and useful to enhance the professional skills of trainees. The approach of developing personalized training systems using the LIDC/IDRL database is a feasible solution to the challenges of constructing specific training program in terms of cost and training efficiency. Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.

  8. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

    PubMed

    Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

    2018-04-06

    Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

  9. Mycobacteriophage genome database.

    PubMed

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  10. Maize databases

    USDA-ARS?s Scientific Manuscript database

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  11. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  12. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  13. Comparing the Hematopoetic Syndrome Time Course in the NHP Animal Model to Radiation Accident Cases From the Database Search.

    PubMed

    Graessle, Dieter H; Dörr, Harald; Bennett, Alexander; Shapiro, Alla; Farese, Ann M; MacVittie, Thomas J; Meineke, Viktor

    2015-11-01

    Since controlled clinical studies on drug administration for the acute radiation syndrome are lacking, clinical data of human radiation accident victims as well as experimental animal models are the main sources of information. This leads to the question of how to compare and link clinical observations collected after human radiation accidents with experimental observations in non-human primate (NHP) models. Using the example of granulocyte counts in the peripheral blood following radiation exposure, approaches for adaptation between NHP and patient databases on data comparison and transformation are introduced. As a substitute for studying the effects of administration of granulocyte-colony stimulating factor (G-CSF) in human clinical trials, the method of mathematical modeling is suggested using the example of G-CSF administration to NHP after total body irradiation.

  14. Creating Your Own Database.

    ERIC Educational Resources Information Center

    Blair, John C., Jr.

    1982-01-01

    Outlines the important factors to be considered in selecting a database management system for use with a microcomputer and presents a series of guidelines for developing a database. General procedures, report generation, data manipulation, information storage, word processing, data entry, database indexes, and relational databases are among the…

  15. Selecting Data-Base Management Software for Microcomputers in Libraries and Information Units.

    ERIC Educational Resources Information Center

    Pieska, K. A. O.

    1986-01-01

    Presents a model for the evaluation of database management systems software from the viewpoint of librarians and information specialists. The properties of data management systems, database management systems, and text retrieval systems are outlined and compared. (10 references) (CLB)

  16. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  17. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  18. NASA STI Database, Aerospace Database and ARIN coverage of 'space law'

    NASA Technical Reports Server (NTRS)

    Buchan, Ronald L.

    1992-01-01

    The space-law coverage provided by the NASA STI Database, the Aerospace Database, and ARIN is briefly described. Particular attention is given to the space law content of the two Databases and of ARIN, the NASA Thesauras space law terminology, space law publication forms, and the availability of the space law literature.

  19. A review of accessibility of administrative healthcare databases in the Asia-Pacific region

    PubMed Central

    Milea, Dominique; Azmi, Soraya; Reginald, Praveen; Verpillat, Patrice; Francois, Clement

    2015-01-01

    Objective We describe and compare the availability and accessibility of administrative healthcare databases (AHDB) in several Asia-Pacific countries: Australia, Japan, South Korea, Taiwan, Singapore, China, Thailand, and Malaysia. Methods The study included hospital records, reimbursement databases, prescription databases, and data linkages. Databases were first identified through PubMed, Google Scholar, and the ISPOR database register. Database custodians were contacted. Six criteria were used to assess the databases and provided the basis for a tool to categorise databases into seven levels ranging from least accessible (Level 1) to most accessible (Level 7). We also categorised overall data accessibility for each country as high, medium, or low based on accessibility of databases as well as the number of academic articles published using the databases. Results Fifty-four administrative databases were identified. Only a limited number of databases allowed access to raw data and were at Level 7 [Medical Data Vision EBM Provider, Japan Medical Data Centre (JMDC) Claims database and Nihon-Chouzai Pharmacy Claims database in Japan, and Medicare, Pharmaceutical Benefits Scheme (PBS), Centre for Health Record Linkage (CHeReL), HealthLinQ, Victorian Data Linkages (VDL), SA-NT DataLink in Australia]. At Levels 3–6 were several databases from Japan [Hamamatsu Medical University Database, Medi-Trend, Nihon University School of Medicine Clinical Data Warehouse (NUSM)], Australia [Western Australia Data Linkage (WADL)], Taiwan [National Health Insurance Research Database (NHIRD)], South Korea [Health Insurance Review and Assessment Service (HIRA)], and Malaysia [United Nations University (UNU)-Casemix]. Countries were categorised as having a high level of data accessibility (Australia, Taiwan, and Japan), medium level of accessibility (South Korea), or a low level of accessibility (Thailand, China, Malaysia, and Singapore). In some countries, data may be available but

  20. A review of accessibility of administrative healthcare databases in the Asia-Pacific region.

    PubMed

    Milea, Dominique; Azmi, Soraya; Reginald, Praveen; Verpillat, Patrice; Francois, Clement

    2015-01-01

    We describe and compare the availability and accessibility of administrative healthcare databases (AHDB) in several Asia-Pacific countries: Australia, Japan, South Korea, Taiwan, Singapore, China, Thailand, and Malaysia. The study included hospital records, reimbursement databases, prescription databases, and data linkages. Databases were first identified through PubMed, Google Scholar, and the ISPOR database register. Database custodians were contacted. Six criteria were used to assess the databases and provided the basis for a tool to categorise databases into seven levels ranging from least accessible (Level 1) to most accessible (Level 7). We also categorised overall data accessibility for each country as high, medium, or low based on accessibility of databases as well as the number of academic articles published using the databases. Fifty-four administrative databases were identified. Only a limited number of databases allowed access to raw data and were at Level 7 [Medical Data Vision EBM Provider, Japan Medical Data Centre (JMDC) Claims database and Nihon-Chouzai Pharmacy Claims database in Japan, and Medicare, Pharmaceutical Benefits Scheme (PBS), Centre for Health Record Linkage (CHeReL), HealthLinQ, Victorian Data Linkages (VDL), SA-NT DataLink in Australia]. At Levels 3-6 were several databases from Japan [Hamamatsu Medical University Database, Medi-Trend, Nihon University School of Medicine Clinical Data Warehouse (NUSM)], Australia [Western Australia Data Linkage (WADL)], Taiwan [National Health Insurance Research Database (NHIRD)], South Korea [Health Insurance Review and Assessment Service (HIRA)], and Malaysia [United Nations University (UNU)-Casemix]. Countries were categorised as having a high level of data accessibility (Australia, Taiwan, and Japan), medium level of accessibility (South Korea), or a low level of accessibility (Thailand, China, Malaysia, and Singapore). In some countries, data may be available but accessibility was restricted

  1. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  2. Databases: Beyond the Basics.

    ERIC Educational Resources Information Center

    Whittaker, Robert

    This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…

  3. Identifying relevant data for a biological database: handcrafted rules versus machine learning.

    PubMed

    Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles

    2011-01-01

    With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.

  4. Database Dictionary for Ethiopian National Ground-Water DAtabase (ENGDA) Data Fields

    USGS Publications Warehouse

    Kuniansky, Eve L.; Litke, David W.; Tucci, Patrick

    2007-01-01

    Introduction This document describes the data fields that are used for both field forms and the Ethiopian National Ground-water Database (ENGDA) tables associated with information stored about production wells, springs, test holes, test wells, and water level or water-quality observation wells. Several different words are used in this database dictionary and in the ENGDA database to describe a narrow shaft constructed in the ground. The most general term is borehole, which is applicable to any type of hole. A well is a borehole specifically constructed to extract water from the ground; however, for this data dictionary and for the ENGDA database, the words well and borehole are used interchangeably. A production well is defined as any well used for water supply and includes hand-dug wells, small-diameter bored wells equipped with hand pumps, or large-diameter bored wells equipped with large-capacity motorized pumps. Test holes are borings made to collect information about the subsurface with continuous core or non-continuous core and/or where geophysical logs are collected. Test holes are not converted into wells. A test well is a well constructed for hydraulic testing of an aquifer in order to plan a larger ground-water production system. A water-level or water-quality observation well is a well that is used to collect information about an aquifer and not used for water supply. A spring is any naturally flowing, local, ground-water discharge site. The database dictionary is designed to help define all fields on both field data collection forms (provided in attachment 2 of this report) and for the ENGDA software screen entry forms (described in Litke, 2007). The data entered into each screen entry field are stored in relational database tables within the computer database. The organization of the database dictionary is designed based on field data collection and the field forms, because this is what the majority of people will use. After each field, however, the

  5. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  6. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  7. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  8. NUCFRG2: An evaluation of the semiempirical nuclear fragmentation database

    NASA Technical Reports Server (NTRS)

    Wilson, J. W.; Tripathi, R. K.; Cucinotta, F. A.; Shinn, J. L.; Badavi, F. F.; Chun, S. Y.; Norbury, J. W.; Zeitlin, C. J.; Heilbronn, L.; Miller, J.

    1995-01-01

    A semiempirical abrasion-ablation model has been successful in generating a large nuclear database for the study of high charge and energy (HZE) ion beams, radiation physics, and galactic cosmic ray shielding. The cross sections that are generated are compared with measured HZE fragmentation data from various experimental groups. A research program for improvement of the database generator is also discussed.

  9. O-GLYCBASE Version 3.0: a revised database of O-glycosylated proteins.

    PubMed Central

    Hansen, J E; Lund, O; Nilsson, J; Rapacki, K; Brunak, S

    1998-01-01

    O-GLYCBASE is a revised database of information on glycoproteins and their O-linked glycosylation sites. Entries are compiled and revised from the literature, and from the sequence databases. Entries include information about species, sequence, glycosylation sites and glycan type and is fully cross-referenced. Compared to version 2.0 the number of entries has increased by 20%. Sequence logos displaying the acceptor specificity patterns for the GalNAc, mannose and GlcNAc transferases are shown. The O-GLYCBASE database is available through the WWW at http://www.cbs.dtu. dk/databases/OGLYCBASE/ PMID:9399880

  10. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data

    PubMed Central

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055

  11. HIM-herbal ingredients in-vivo metabolism database.

    PubMed

    Kang, Hong; Tang, Kailin; Liu, Qi; Sun, Yi; Huang, Qi; Zhu, Ruixin; Gao, Jun; Zhang, Duanfeng; Huang, Chenggang; Cao, Zhiwei

    2013-05-31

    Herbal medicine has long been viewed as a valuable asset for potential new drug discovery and herbal ingredients' metabolites, especially the in vivo metabolites were often found to gain better pharmacological, pharmacokinetic and even better safety profiles compared to their parent compounds. However, these herbal metabolite information is still scattered and waiting to be collected. HIM database manually collected so far the most comprehensive available in-vivo metabolism information for herbal active ingredients, as well as their corresponding bioactivity, organs and/or tissues distribution, toxicity, ADME and the clinical research profile. Currently HIM contains 361 ingredients and 1104 corresponding in-vivo metabolites from 673 reputable herbs. Tools of structural similarity, substructure search and Lipinski's Rule of Five are also provided. Various links were made to PubChem, PubMed, TCM-ID (Traditional Chinese Medicine Information database) and HIT (Herbal ingredients' targets databases). A curated database HIM is set up for the in vivo metabolites information of the active ingredients for Chinese herbs, together with their corresponding bioactivity, toxicity and ADME profile. HIM is freely accessible to academic researchers at http://www.bioinformatics.org.cn/.

  12. Lynx web services for annotations and systems analysis of multi-gene disorders.

    PubMed

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Trials and tribulations: how we established a major incident database.

    PubMed

    Hardy, S E J; Fattah, S

    2017-01-25

    We describe the process of setting up a database of major incident reports and its potential future application. A template for reporting on major incidents was developed using a consensus-based process involving a team of experts in the field. A website was set up as a platform from which to launch the template and as a database of submitted reports. This paper describes the processes involved in setting up a major incident reporting database. It describes how specific difficulties have been overcome and anticipates challenges for the future. We have successfully set up a major incident database, the main purpose of which is to have a repository of standardised major incident reports that can be analysed and compared in order to learn from them.

  14. Clinical decision support tools: performance of personal digital assistant versus online drug information databases.

    PubMed

    Clauson, Kevin A; Polen, Hyla H; Marsh, Wallace A

    2007-12-01

    To evaluate personal digital assistant (PDA) drug information databases used to support clinical decision-making, and to compare the performance of PDA databases with their online versions. Prospective evaluation with descriptive analysis. Five drug information databases available for PDAs and online were evaluated according to their scope (inclusion of correct answers), completeness (on a 3-point scale), and ease of use; 158 question-answer pairs across 15 weighted categories of drug information essential to health care professionals were used to evaluate these databases. An overall composite score integrating these three measures was then calculated. Scores for the PDA databases and for each PDA-online pair were compared. Among the PDA databases, composite rankings, from highest to lowest, were as follows: Lexi-Drugs, Clinical Pharmacology OnHand, Epocrates Rx Pro, mobileMicromedex (now called Thomson Clinical Xpert), and Epocrates Rx free version. When we compared database pairs, online databases that had greater scope than their PDA counterparts were Clinical Pharmacology (137 vs 100 answers, p<0.001), Micromedex (132 vs 96 answers, p<0.001), Lexi-Comp Online (131 vs 119 answers, p<0.001), and Epocrates Online Premium (103 vs 98 answers, p=0.001). Only Micromedex online was more complete than its PDA version (p=0.008). Regarding ease of use, the Lexi-Drugs PDA database was superior to Lexi-Comp Online (p<0.001); however, Epocrates Online Premium, Epocrates Online Free, and Micromedex online were easier to use than their PDA counterparts (p<0.001). In terms of composite scores, only the online versions of Clinical Pharmacology and Micromedex demonstrated superiority over their PDA versions (p>0.01). Online and PDA drug information databases assist practitioners in improving their clinical decision-making. Lexi-Drugs performed significantly better than all of the other PDA databases evaluated. No PDA database demonstrated superiority to its online counterpart

  15. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  16. SNaX: A Database of Supernova X-Ray Light Curves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ross, Mathias; Dwarkadas, Vikram V., E-mail: Mathias_Ross@msn.com, E-mail: vikram@oddjob.uchicago.edu

    We present the Supernova X-ray Database (SNaX), a compilation of the X-ray data from young supernovae (SNe). The database includes the X-ray fluxes and luminosities of young SNe, from days to years after outburst. The original goal and intent of this study was to present a database of Type IIn SNe (SNe IIn), which we have accomplished. Our ongoing goal is to expand the database to include all SNe for which published data are available. The database interface allows one to search for SNe using various criteria, plot all or selected data points, and download both the data and themore » plot. The plotting facility allows for significant customization. There is also a facility for the user to submit data that can be directly incorporated into the database. We include an option to fit the decay of any given SN light curve with a power-law. The database includes a conversion of most data points to a common 0.3–8 keV band so that SN light curves may be directly compared with each other. A mailing list has been set up to disseminate information about the database. We outline the structure and function of the database, describe its various features, and outline the plans for future expansion.« less

  17. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  18. Inferring drug-disease associations based on known protein complexes.

    PubMed

    Yu, Liang; Huang, Jianbin; Ma, Zhixin; Zhang, Jing; Zou, Yapeng; Gao, Lin

    2015-01-01

    Inferring drug-disease associations is critical in unveiling disease mechanisms, as well as discovering novel functions of available drugs, or drug repositioning. Previous work is primarily based on drug-gene-disease relationship, which throws away many important information since genes execute their functions through interacting others. To overcome this issue, we propose a novel methodology that discover the drug-disease association based on protein complexes. Firstly, the integrated heterogeneous network consisting of drugs, protein complexes, and disease are constructed, where we assign weights to the drug-disease association by using probability. Then, from the tripartite network, we get the indirect weighted relationships between drugs and diseases. The larger the weight, the higher the reliability of the correlation. We apply our method to mental disorders and hypertension, and validate the result by using comparative toxicogenomics database. Our ranked results can be directly reinforced by existing biomedical literature, suggesting that our proposed method obtains higher specificity and sensitivity. The proposed method offers new insight into drug-disease discovery. Our method is publicly available at http://1.complexdrug.sinaapp.com/Drug_Complex_Disease/Data_Download.html.

  19. The New Zealand Tsunami Database: historical and modern records

    NASA Astrophysics Data System (ADS)

    Barberopoulou, A.; Downes, G. L.; Cochran, U. A.; Clark, K.; Scheele, F.

    2016-12-01

    A database of historical (pre-instrumental) and modern (instrumentally recorded)tsunamis that have impacted or been observed in New Zealand has been compiled andpublished online. New Zealand's tectonic setting, astride an obliquely convergenttectonic boundary on the Pacific Rim, means that it is vulnerable to local, regional andcircum-Pacific tsunamis. Despite New Zealand's comparatively short written historicalrecord of c. 200 years there is a wealth of information about the impact of past tsunamis.The New Zealand Tsunami Database currently has 800+ entries that describe >50 highvaliditytsunamis. Sources of historical information include witness reports recorded indiaries, notes, newspapers, books, and photographs. Information on recent events comesfrom tide gauges and other instrumental recordings such as DART® buoys, and media ofgreater variety, for example, video and online surveys. The New Zealand TsunamiDatabase is an ongoing project with information added as further historical records cometo light. Modern tsunamis are also added to the database once the relevant data for anevent has been collated and edited. This paper briefly overviews the procedures and toolsused in the recording and analysis of New Zealand's historical tsunamis, with emphasison database content.

  20. Scheduled Civil Aircraft Emission Inventories for 1999: Database Development and Analysis

    NASA Technical Reports Server (NTRS)

    Sutkus, Donald J., Jr.; Baughcum, Steven L.; DuBois, Douglas P.

    2001-01-01

    This report describes the development of a three-dimensional database of aircraft fuel burn and emissions (NO(x), CO, and hydrocarbons) for the scheduled commercial aircraft fleet for each month of 1999. Global totals of emissions and fuel burn for 1999 are compared to global totals from 1992 and 2015 databases. 1999 fuel burn, departure and distance totals for selected airlines are compared to data reported on DOT Form 41 to evaluate the accuracy of the calculations. DOT Form T-100 data were used to determine typical payloads for freighter aircraft and this information was used to model freighter aircraft more accurately by using more realistic payloads. Differences in the calculation methodology used to create the 1999 fuel burn and emissions database from the methodology used in previous work are described and evaluated.

  1. Chemical Kinetics Database

    National Institute of Standards and Technology Data Gateway

    SRD 17 NIST Chemical Kinetics Database (Web, free access)   The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

  2. Critical assessment of human metabolic pathway databases: a stepping stone for future integration

    PubMed Central

    2011-01-01

    Background Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. Results We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. Conclusions Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone

  3. Is there a trade-off between fertility and longevity? A comparative study of women from three large historical databases accounting for mortality selection.

    PubMed

    Gagnon, Alain; Smith, Ken R; Tremblay, Marc; Vézina, Hélène; Paré, Paul-Philippe; Desjardins, Bertrand

    2009-01-01

    Frontier populations provide exceptional opportunities to test the hypothesis of a trade-off between fertility and longevity. In such populations, mechanisms favoring reproduction usually find fertile ground, and if these mechanisms reduce longevity, demographers should observe higher postreproductive mortality among highly fertile women. We test this hypothesis using complete female reproductive histories from three large demographic databases: the Registre de la population du Québec ancien (Université de Montréal), which covers the first centuries of settlement in Quebec; the BALSAC database (Université du Québec à Chicoutimi), including comprehensive records for the Saguenay-Lac-St-Jean (SLSJ) in Quebec in the nineteenth and twentieth centuries; and the Utah Population Database (University of Utah), including all individuals who experienced a vital event on the Mormon Trail and their descendants. Together, the three samples allow for comparisons over time and space, and represent one of the largest set of natural fertility cohorts used to simultaneously assess reproduction and longevity. Using survival analyses, we found a negative influence of parity and a positive influence of age at last child on postreproductive survival in the three populations, as well as a significant interaction between these two variables. The effect sizes of all these parameters were remarkably similar in the three samples. However, we found little evidence that early fertility affects postreproductive survival. The use of Heckman's procedure assessing the impact of mortality selection during reproductive ages did not appreciably alter these results. We conclude our empirical investigation by discussing the advantages of comparative approaches. 2009 Wiley-Liss, Inc.

  4. SwePep, a database designed for endogenous peptides and mass spectrometry.

    PubMed

    Fälth, Maria; Sköld, Karl; Norrman, Mathias; Svensson, Marcus; Fenyö, David; Andren, Per E

    2006-06-01

    A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.

  5. Multivariate normative comparisons using an aggregated database

    PubMed Central

    Murre, Jaap M. J.; Huizenga, Hilde M.

    2017-01-01

    In multivariate normative comparisons, a patient’s profile of test scores is compared to those in a normative sample. Recently, it has been shown that these multivariate normative comparisons enhance the sensitivity of neuropsychological assessment. However, multivariate normative comparisons require multivariate normative data, which are often unavailable. In this paper, we show how a multivariate normative database can be constructed by combining healthy control group data from published neuropsychological studies. We show that three issues should be addressed to construct a multivariate normative database. First, the database may have a multilevel structure, with participants nested within studies. Second, not all tests are administered in every study, so many data may be missing. Third, a patient should be compared to controls of similar age, gender and educational background rather than to the entire normative sample. To address these issues, we propose a multilevel approach for multivariate normative comparisons that accounts for missing data and includes covariates for age, gender and educational background. Simulations show that this approach controls the number of false positives and has high sensitivity to detect genuine deviations from the norm. An empirical example is provided. Implications for other domains than neuropsychology are also discussed. To facilitate broader adoption of these methods, we provide code implementing the entire analysis in the open source software package R. PMID:28267796

  6. Database Search Strategies & Tips. Reprints from the Best of "ONLINE" [and]"DATABASE."

    ERIC Educational Resources Information Center

    Online, Inc., Weston, CT.

    Reprints of 17 articles presenting strategies and tips for searching databases online appear in this collection, which is one in a series of volumes of reprints from "ONLINE" and "DATABASE" magazines. Edited for information professionals who use electronically distributed databases, these articles address such topics as: (1)…

  7. PylotDB - A Database Management, Graphing, and Analysis Tool Written in Python

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnette, Daniel W.

    2012-01-04

    PylotDB, written completely in Python, provides a user interface (UI) with which to interact with, analyze, graph data from, and manage open source databases such as MySQL. The UI mitigates the user having to know in-depth knowledge of the database application programming interface (API). PylotDB allows the user to generate various kinds of plots from user-selected data; generate statistical information on text as well as numerical fields; backup and restore databases; compare database tables across different databases as well as across different servers; extract information from any field to create new fields; generate, edit, and delete databases, tables, and fields;more » generate or read into a table CSV data; and similar operations. Since much of the database information is brought under control of the Python computer language, PylotDB is not intended for huge databases for which MySQL and Oracle, for example, are better suited. PylotDB is better suited for smaller databases that might be typically needed in a small research group situation. PylotDB can also be used as a learning tool for database applications in general.« less

  8. Integrating Borrowed Records into a Database: Impact on Thesaurus Development and Retrieval.

    ERIC Educational Resources Information Center

    And Others; Kirtland, Monika

    1980-01-01

    Discusses three approaches to thesaurus and indexing/retrieval language maintenance for combined databases: reindexing, merging, and initial standardization. Two thesauri for a combined database are evaluated in terms of their compatibility, and indexing practices are compared. Tables and figures help illustrate aspects of the comparison. (SW)

  9. International Shock-Wave Database: Current Status

    NASA Astrophysics Data System (ADS)

    Levashov, Pavel

    2013-06-01

    speed in the Hugoniot state, and time-dependent free-surface or window-interface velocity profiles. Users are able to search the information in the database and obtain the experimental points in tabular or plain text formats directly via the Internet using common browsers. It is also possible to plot the experimental points for comparison with different approximations and results of equation-of-state calculations. The user can present the results of calculations in text or graphical forms and compare them with any experimental data available in the database. A short history of the shock-wave database will be presented and current possibilities of ISWdb will be demonstrated. Web-site of the project: http://iswdb.info. This work is supported by SNL contracts # 1143875, 1196352.

  10. The salinity tolerant poplar database (STPD): a comprehensive database for studying tree salt-tolerant adaption and poplar genomics.

    PubMed

    Ma, Yazhen; Xu, Ting; Wan, Dongshi; Ma, Tao; Shi, Sheng; Liu, Jianquan; Hu, Quanjun

    2015-03-17

    Soil salinity is a significant factor that impairs plant growth and agricultural productivity, and numerous efforts are underway to enhance salt tolerance of economically important plants. Populus species are widely cultivated for diverse uses. Especially, they grow in different habitats, from salty soil to mesophytic environment, and are therefore used as a model genus for elucidating physiological and molecular mechanisms of stress tolerance in woody plants. The Salinity Tolerant Poplar Database (STPD) is an integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches. A new Salinity Tolerant Poplar Database has been developed to assist studies of salt tolerance in trees and poplar genomics. The database will be continuously updated to incorporate new genome-wide data of related poplar species. This database will serve as an infrastructure for researches on the molecular function of genes, comparative genomics, and evolution in closely related species as well as promote advances in molecular breeding within Populus. The

  11. Visibility of medical informatics regarding bibliometric indices and databases

    PubMed Central

    2011-01-01

    Background The quantitative study of the publication output (bibliometrics) deeply influences how scientific work is perceived (bibliometric visibility). Recently, new bibliometric indices and databases have been established, which may change the visibility of disciplines, institutions and individuals. This study examines the effects of the new indices on the visibility of Medical Informatics. Methods By objective criteria, three sets of journals are chosen, two representing Medical Informatics and a third addressing Internal Medicine as a benchmark. The availability of index data (index coverage) and the aggregate scores of these corpora are compared for journal-related (Journal impact factor, Eigenfactor metrics, SCImago journal rank) and author-related indices (Hirsch-index, Egghes G-index). Correlation analysis compares the dependence of author-related indices. Results The bibliometric visibility depended on the research focus and the citation database: Scopus covers more journals relevant for Medical Informatics than ISI/Thomson Reuters. Journals focused on Medical Informatics' methodology were negatively affected by the Eigenfactor metrics, while the visibility profited from an interdisciplinary research focus. The correlation between Hirsch-indices computed on citation databases and the Internet was strong. Conclusions The visibility of smaller technology-oriented disciplines like Medical Informatics is changed by the new bibliometric indices and databases possibly leading to suitably changed publication strategies. Freely accessible author-related indices enable an easy and adequate individual assessment. PMID:21496230

  12. Database for LDV Signal Processor Performance Analysis

    NASA Technical Reports Server (NTRS)

    Baker, Glenn D.; Murphy, R. Jay; Meyers, James F.

    1989-01-01

    A comparative and quantitative analysis of various laser velocimeter signal processors is difficult because standards for characterizing signal bursts have not been established. This leaves the researcher to select a signal processor based only on manufacturers' claims without the benefit of direct comparison. The present paper proposes the use of a database of digitized signal bursts obtained from a laser velocimeter under various configurations as a method for directly comparing signal processors.

  13. Naval Ship Database: Database Design, Implementation, and Schema

    DTIC Science & Technology

    2013-09-01

    incoming data. The solution allows database users to store and analyze data collected by navy ships in the Royal Canadian Navy ( RCN ). The data...understanding RCN jargon and common practices on a typical RCN vessel. This experience led to the development of several error detection methods to...data to be stored in the database. Mr. Massel has also collected data pertaining to day to day activities on RCN vessels that has been imported into

  14. Hawaii bibliographic database

    USGS Publications Warehouse

    Wright, T.L.; Takahashi, T.J.

    1998-01-01

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  15. A generic method for improving the spatial interoperability of medical and ecological databases.

    PubMed

    Ghenassia, A; Beuscart, J B; Ficheur, G; Occelli, F; Babykina, E; Chazard, E; Genin, M

    2017-10-03

    The availability of big data in healthcare and the intensive development of data reuse and georeferencing have opened up perspectives for health spatial analysis. However, fine-scale spatial studies of ecological and medical databases are limited by the change of support problem and thus a lack of spatial unit interoperability. The use of spatial disaggregation methods to solve this problem introduces errors into the spatial estimations. Here, we present a generic, two-step method for merging medical and ecological databases that avoids the use of spatial disaggregation methods, while maximizing the spatial resolution. Firstly, a mapping table is created after one or more transition matrices have been defined. The latter link the spatial units of the original databases to the spatial units of the final database. Secondly, the mapping table is validated by (1) comparing the covariates contained in the two original databases, and (2) checking the spatial validity with a spatial continuity criterion and a spatial resolution index. We used our novel method to merge a medical database (the French national diagnosis-related group database, containing 5644 spatial units) with an ecological database (produced by the French National Institute of Statistics and Economic Studies, and containing with 36,594 spatial units). The mapping table yielded 5632 final spatial units. The mapping table's validity was evaluated by comparing the number of births in the medical database and the ecological databases in each final spatial unit. The median [interquartile range] relative difference was 2.3% [0; 5.7]. The spatial continuity criterion was low (2.4%), and the spatial resolution index was greater than for most French administrative areas. Our innovative approach improves interoperability between medical and ecological databases and facilitates fine-scale spatial analyses. We have shown that disaggregation models and large aggregation techniques are not necessarily the best ways to

  16. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  17. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    PubMed Central

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  18. Clinical decision support tools: personal digital assistant versus online dietary supplement databases.

    PubMed

    Clauson, Kevin A; Polen, Hyla H; Peak, Amy S; Marsh, Wallace A; DiScala, Sandra L

    2008-11-01

    Clinical decision support tools (CDSTs) on personal digital assistants (PDAs) and online databases assist healthcare practitioners who make decisions about dietary supplements. To assess and compare the content of PDA dietary supplement databases and their online counterparts used as CDSTs. A total of 102 question-and-answer pairs were developed within 10 weighted categories of the most clinically relevant aspects of dietary supplement therapy. PDA versions of AltMedDex, Lexi-Natural, Natural Medicines Comprehensive Database, and Natural Standard and their online counterparts were assessed by scope (percent of correct answers present), completeness (3-point scale), ease of use, and a composite score integrating all 3 criteria. Descriptive statistics and inferential statistics, including a chi(2) test, Scheffé's multiple comparison test, McNemar's test, and the Wilcoxon signed rank test were used to analyze data. The scope scores for PDA databases were: Natural Medicines Comprehensive Database 84.3%, Natural Standard 58.8%, Lexi-Natural 50.0%, and AltMedDex 36.3%, with Natural Medicines Comprehensive Database statistically superior (p < 0.01). Completeness scores were: Natural Medicines Comprehensive Database 78.4%, Natural Standard 51.0%, Lexi-Natural 43.5%, and AltMedDex 29.7%. Lexi-Natural was superior in ease of use (p < 0.01). Composite scores for PDA databases were: Natural Medicines Comprehensive Database 79.3, Natural Standard 53.0, Lexi-Natural 48.0, and AltMedDex 32.5, with Natural Medicines Comprehensive Database superior (p < 0.01). There was no difference between the scope for PDA and online database pairs with Lexi-Natural (50.0% and 53.9%, respectively) or Natural Medicines Comprehensive Database (84.3% and 84.3%, respectively) (p > 0.05), whereas differences existed for AltMedDex (36.3% vs 74.5%, respectively) and Natural Standard (58.8% vs 80.4%, respectively) (p < 0.01). For composite scores, AltMedDex and Natural Standard online were better than

  19. GlycomeDB – integration of open-access carbohydrate structure databases

    PubMed Central

    Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm

    2008-01-01

    Background Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases. Results We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators. Conclusion GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource. PMID:18803830

  20. Mock jurors' use of error rates in DNA database trawls.

    PubMed

    Scurich, Nicholas; John, Richard S

    2013-12-01

    Forensic science is not infallible, as data collected by the Innocence Project have revealed. The rate at which errors occur in forensic DNA testing-the so-called "gold standard" of forensic science-is not currently known. This article presents a Bayesian analysis to demonstrate the profound impact that error rates have on the probative value of a DNA match. Empirical evidence on whether jurors are sensitive to this effect is equivocal: Studies have typically found they are not, while a recent, methodologically rigorous study found that they can be. This article presents the results of an experiment that examined this issue within the context of a database trawl case in which one DNA profile was tested against a multitude of profiles. The description of the database was manipulated (i.e., "medical" or "offender" database, or not specified) as was the rate of error (i.e., one-in-10 or one-in-1,000). Jury-eligible participants were nearly twice as likely to convict in the offender database condition compared to the condition not specified. The error rates did not affect verdicts. Both factors, however, affected the perception of the defendant's guilt, in the expected direction, although the size of the effect was meager compared to Bayesian prescriptions. The results suggest that the disclosure of an offender database to jurors might constitute prejudicial evidence, and calls for proficiency testing in forensic science as well as training of jurors are echoed. (c) 2013 APA, all rights reserved

  1. Blending Education and Polymer Science: Semiautomated Creation of a Thermodynamic Property Database

    ERIC Educational Resources Information Center

    Tchoua, Roselyne B.; Qin, Jian; Audus, Debra J.; Chard, Kyle; Foster, Ian T.; de Pablo, Juan

    2016-01-01

    Structured databases of chemical and physical properties play a central role in the everyday research activities of scientists and engineers. In materials science, researchers and engineers turn to these databases to quickly query, compare, and aggregate various properties, thereby allowing for the development or application of new materials. The…

  2. [Health status of populations living in French overseas territories in 2012, compared with metropolitan France: An analysis of the national health insurance database].

    PubMed

    Filipovic-Pierucci, A; Rigault, A; Fagot-Campagna, A; Tuppin, P

    2016-06-01

    This study uses healthcare consumption to compare the health status of beneficiaries of the French national health insurance general scheme between individuals living in French overseas territories (FOT) and those living in metropolitan France. Data were extracted from the French national health insurance database (Sniiram) for 2012, using algorithms, 56 groups of diseases and 27 groups of hospital activity were isolated. Standardized morbidity ratio for age and sex (SMR) were used to compare FOT to mainland France. Compared with mainland France, people living in the four FOT had high SMR for diabetes care (Guadeloupe 1.9; Martinique 1.7; Guyane 1.9; La Réunion 2.3), dialysis (2.7; 2.4; 3.8; 4.4), stroke (1.2; 1.1; 2.0; 1.5), and hospitalization for infectious diseases (1.9; 2.5; 2.4; 1.4) and obstetrics (1.4; 1.2; 1.9; 1.2). Care for inflammatory bowel disease or cancer were less frequent except for prostate in Martinique and Guadeloupe (2.3). People living in Martinique, Guadeloupe and la Reunion had more frequently care for psychotic disorders (2.0; 1.7; 1.2), dementia (1.1; 1.3; 11), epileptic seizures (1.4; 1.4; 16) and hospitalizations for burns (2.6; 1.7; 2.9). In la Reunion, people had more frequently coronary syndrome (1.3), cardiac heart failure (1.6), chronic respiratory diseases except cystic fibrosis (1.5), drug addiction (1.4) and hospitalizations for cardiovascular catheterization (1.4) and toxicology, poisoning, alcohol (1.7). Other differences were observed by gender: HIV infection, peripheral arterial disease, some chronic inflammatory disease (lupus) were more frequent in women living in Martinique or Guadeloupe, compared to women from mainland France and psychotic disorders for men. From la Reunion, men had more frequently liver and pancreatic diseases and hospitalisation for toxicology, poisoning, alcohol than men from mainland France. This study highlights the utility of administrative database to compare and follow population health status

  3. Comparison of locus-specific databases for BRCA1 and BRCA2 variants reveals disparity in variant classification within and among databases.

    PubMed

    Vail, Paris J; Morris, Brian; van Kan, Aric; Burdett, Brianna C; Moyes, Kelsey; Theisen, Aaron; Kerr, Iain D; Wenstrup, Richard J; Eggington, Julie M

    2015-10-01

    Genetic variants of uncertain clinical significance (VUSs) are a common outcome of clinical genetic testing. Locus-specific variant databases (LSDBs) have been established for numerous disease-associated genes as a research tool for the interpretation of genetic sequence variants to facilitate variant interpretation via aggregated data. If LSDBs are to be used for clinical practice, consistent and transparent criteria regarding the deposition and interpretation of variants are vital, as variant classifications are often used to make important and irreversible clinical decisions. In this study, we performed a retrospective analysis of 2017 consecutive BRCA1 and BRCA2 genetic variants identified from 24,650 consecutive patient samples referred to our laboratory to establish an unbiased dataset representative of the types of variants seen in the US patient population, submitted by clinicians and researchers for BRCA1 and BRCA2 testing. We compared the clinical classifications of these variants among five publicly accessible BRCA1 and BRCA2 variant databases: BIC, ClinVar, HGMD (paid version), LOVD, and the UMD databases. Our results show substantial disparity of variant classifications among publicly accessible databases. Furthermore, it appears that discrepant classifications are not the result of a single outlier but widespread disagreement among databases. This study also shows that databases sometimes favor a clinical classification when current best practice guidelines (ACMG/AMP/CAP) would suggest an uncertain classification. Although LSDBs have been well established for research applications, our results suggest several challenges preclude their wider use in clinical practice.

  4. Kentucky geotechnical database.

    DOT National Transportation Integrated Search

    2005-03-01

    Development of a comprehensive dynamic, geotechnical database is described. Computer software selected to program the client/server application in windows environment, components and structure of the geotechnical database, and primary factors cons...

  5. The BioGRID interaction database: 2017 update

    PubMed Central

    Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike

    2017-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099

  6. FDA toxicity databases and real-time data entry.

    PubMed

    Arvidson, Kirk B

    2008-11-15

    Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributed in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.

  7. FDA toxicity databases and real-time data entry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arvidson, Kirk B.

    Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributedmore » in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been

  8. NREL: U.S. Life Cycle Inventory Database - About the LCI Database Project

    Science.gov Websites

    About the LCI Database Project The U.S. Life Cycle Inventory (LCI) Database is a publicly available data collection and analysis methods. Finding consistent and transparent LCI data for life cycle and maintain the database. The 2009 U.S. Life Cycle Inventory (LCI) Data Stakeholder meeting was an

  9. The EpiSLI Database: A Publicly Available Database on Speech and Language

    ERIC Educational Resources Information Center

    Tomblin, J. Bruce

    2010-01-01

    Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…

  10. Sodium content of foods contributing to sodium intake: A comparison between selected foods from the CDC Packaged Food Database and the USDA National Nutrient Database for Standard Reference

    USDA-ARS?s Scientific Manuscript database

    The sodium concentration (mg/100g) for 23 of 125 Sentinel Foods were identified in the 2009 CDC Packaged Food Database (PFD) and compared with data in the USDA’s 2013 Standard Reference 26 (SR 26) database. Sentinel Foods are foods and beverages identified by USDA to be monitored as primary indicat...

  11. [A Terahertz Spectral Database Based on Browser/Server Technique].

    PubMed

    Zhang, Zhuo-yong; Song, Yue

    2015-09-01

    With the solution of key scientific and technical problems and development of instrumentation, the application of terahertz technology in various fields has been paid more and more attention. Owing to the unique characteristic advantages, terahertz technology has been showing a broad future in the fields of fast, non-damaging detections, as well as many other fields. Terahertz technology combined with other complementary methods can be used to cope with many difficult practical problems which could not be solved before. One of the critical points for further development of practical terahertz detection methods depends on a good and reliable terahertz spectral database. We developed a BS (browser/server) -based terahertz spectral database recently. We designed the main structure and main functions to fulfill practical requirements. The terahertz spectral database now includes more than 240 items, and the spectral information was collected based on three sources: (1) collection and citation from some other abroad terahertz spectral databases; (2) collected from published literatures; and (3) spectral data measured in our laboratory. The present paper introduced the basic structure and fundament functions of the terahertz spectral database developed in our laboratory. One of the key functions of this THz database is calculation of optical parameters. Some optical parameters including absorption coefficient, refractive index, etc. can be calculated based on the input THz time domain spectra. The other main functions and searching methods of the browser/server-based terahertz spectral database have been discussed. The database search system can provide users convenient functions including user registration, inquiry, displaying spectral figures and molecular structures, spectral matching, etc. The THz database system provides an on-line searching function for registered users. Registered users can compare the input THz spectrum with the spectra of database, according to

  12. Drinking Water Database

    NASA Technical Reports Server (NTRS)

    Murray, ShaTerea R.

    2004-01-01

    This summer I had the opportunity to work in the Environmental Management Office (EMO) under the Chemical Sampling and Analysis Team or CS&AT. This team s mission is to support Glenn Research Center (GRC) and EM0 by providing chemical sampling and analysis services and expert consulting. Services include sampling and chemical analysis of water, soil, fbels, oils, paint, insulation materials, etc. One of this team s major projects is the Drinking Water Project. This is a project that is done on Glenn s water coolers and ten percent of its sink every two years. For the past two summers an intern had been putting together a database for this team to record the test they had perform. She had successfully created a database but hadn't worked out all the quirks. So this summer William Wilder (an intern from Cleveland State University) and I worked together to perfect her database. We began be finding out exactly what every member of the team thought about the database and what they would change if any. After collecting this data we both had to take some courses in Microsoft Access in order to fix the problems. Next we began looking at what exactly how the database worked from the outside inward. Then we began trying to change the database but we quickly found out that this would be virtually impossible.

  13. The need for a juvenile fire setting database.

    PubMed

    Klein, Julianne J; Mondozzi, Mary A; Andrews, David A

    2008-01-01

    A juvenile fire setter can be classified as any youth setting a fire regardless of the reason. Many communities have programs to deal with this problem, most based on models developed by the United States Fire Administration. We reviewed our programs data to compare it with that published nationally. Currently there is not a nationwide database to compare fire setter data. A single institution, retrospective chart review of all fire setters between the years of January 1, 2003 and December 31, 2005 was completed. There were 133 participants ages 3 to 17. Information obtained included age, location, ignition source, court order and recidivism. Analysis from our data set found 26% of the peak ages for fire involvement to be 12 and 14. Location, ignition source, and court ordered participants were divided into two age groups: 3 to 10 (N = 58) and 11 to 17 (N = 75). Bedrooms ranked first for the younger population and schools for the latter. Fifty-four percentage of the 133 participants used lighters over matches. Twelve percentage of the 3- to 10-year-olds were court mandated, compared with 52% of the 11- to 17-year-olds. Recidivism rates were 4 to 10% with a 33 to 38% survey return rate. Currently there is no state or nationwide, time honored data base to compare facts from which conclusions can be drawn. Starting small with a statewide database could educe a stimulus for a national database. This could also enhance the information provided by the United States Fire Administration, National Fire Data Center beginning one juvenile firesetter program and State Fire Marshal's office at a time.

  14. Toward An Unstructured Mesh Database

    NASA Astrophysics Data System (ADS)

    Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

    2014-05-01

    -incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.

  15. Aviation Safety Issues Database

    NASA Technical Reports Server (NTRS)

    Morello, Samuel A.; Ricks, Wendell R.

    2009-01-01

    The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

  16. BDVC (Bimodal Database of Violent Content): A database of violent audio and video

    NASA Astrophysics Data System (ADS)

    Rivera Martínez, Jose Luis; Mijes Cruz, Mario Humberto; Rodríguez Vázqu, Manuel Antonio; Rodríguez Espejo, Luis; Montoya Obeso, Abraham; García Vázquez, Mireya Saraí; Ramírez Acosta, Alejandro Álvaro

    2017-09-01

    Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.

  17. SSER: Species specific essential reactions database.

    PubMed

    Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao

    2017-04-19

    Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .

  18. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.

    PubMed

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.

  19. Phenobarbital and propiconazole toxicogenomic profiles in mice show major similarities consistent with the key role that constitutive androstane receptor (CAR) activation plays in their mode of action

    PubMed Central

    Currie, Richard A.; Peffer, Richard C.; Goetz, Amber K.; Omiecinski, Curtis J.; Goodman, Jay I.

    2014-01-01

    Toxicogenomics (TGx) is employed frequently to investigate underlying molecular mechanisms of the compound of interest and, thus, has become an aid to mode of action determination. However, the results and interpretation of a TGx dataset are influenced by the experimental design and methods of analysis employed. This article describes an evaluation and reanalysis, by two independent laboratories, of previously published TGx mouse liver microarray data for a triazole fungicide, propiconazole (PPZ), and the anticonvulsant drug phenobarbital (PB). Propiconazole produced an increase incidence of liver tumors in male CD-1 mice only at a dose that exceeded the maximum tolerated dose (2500 ppm). Firstly, we illustrate how experimental design differences between two in vivo studies with PPZ and PB may impact the comparisons of TGx results. Secondly, we demonstrate that different researchers using different pathway analysis tools can come to different conclusions on specific mechanistic pathways, even when using the same datasets. Finally, despite these differences the results across three different analyses also show a striking degree of similarity observed for PPZ and PB treated livers when the expression data are viewed as major signaling pathways and cell processes affected. Additional studies described here show that the postulated key event of hepatocellular proliferation was observed in CD-1 mice for both PPZ and PB, and that PPZ is also a potent activator of the mouse CAR nuclear receptor. Thus, with regard to the events which are hallmarks of CAR-induced effects that are key events in the mode of action (MOA) of mouse liver carcinogenesis with PB, PPZ-induced tumors can be viewed as being promoted by a similar PB-like CAR-dependent MOA. PMID:24675475

  20. The Danish Testicular Cancer database.

    PubMed

    Daugaard, Gedske; Kier, Maria Gry Gundgaard; Bandak, Mikkel; Mortensen, Mette Saksø; Larsson, Heidi; Søgaard, Mette; Toft, Birgitte Groenkaer; Engvad, Birte; Agerbæk, Mads; Holm, Niels Vilstrup; Lauritsen, Jakob

    2016-01-01

    The nationwide Danish Testicular Cancer database consists of a retrospective research database (DaTeCa database) and a prospective clinical database (Danish Multidisciplinary Cancer Group [DMCG] DaTeCa database). The aim is to improve the quality of care for patients with testicular cancer (TC) in Denmark, that is, by identifying risk factors for relapse, toxicity related to treatment, and focusing on late effects. All Danish male patients with a histologically verified germ cell cancer diagnosis in the Danish Pathology Registry are included in the DaTeCa databases. Data collection has been performed from 1984 to 2007 and from 2013 onward, respectively. The retrospective DaTeCa database contains detailed information with more than 300 variables related to histology, stage, treatment, relapses, pathology, tumor markers, kidney function, lung function, etc. A questionnaire related to late effects has been conducted, which includes questions regarding social relationships, life situation, general health status, family background, diseases, symptoms, use of medication, marital status, psychosocial issues, fertility, and sexuality. TC survivors alive on October 2014 were invited to fill in this questionnaire including 160 validated questions. Collection of questionnaires is still ongoing. A biobank including blood/sputum samples for future genetic analyses has been established. Both samples related to DaTeCa and DMCG DaTeCa database are included. The prospective DMCG DaTeCa database includes variables regarding histology, stage, prognostic group, and treatment. The DMCG DaTeCa database has existed since 2013 and is a young clinical database. It is necessary to extend the data collection in the prospective database in order to answer quality-related questions. Data from the retrospective database will be added to the prospective data. This will result in a large and very comprehensive database for future studies on TC patients.

  1. Cell death proteomics database: consolidating proteomics data on cell death.

    PubMed

    Arntzen, Magnus Ø; Bull, Vibeke H; Thiede, Bernd

    2013-05-03

    Programmed cell death is a ubiquitous process of utmost importance for the development and maintenance of multicellular organisms. More than 10 different types of programmed cell death forms have been discovered. Several proteomics analyses have been performed to gain insight in proteins involved in the different forms of programmed cell death. To consolidate these studies, we have developed the cell death proteomics (CDP) database, which comprehends data from apoptosis, autophagy, cytotoxic granule-mediated cell death, excitotoxicity, mitotic catastrophe, paraptosis, pyroptosis, and Wallerian degeneration. The CDP database is available as a web-based database to compare protein identifications and quantitative information across different experimental setups. The proteomics data of 73 publications were integrated and unified with protein annotations from UniProt-KB and gene ontology (GO). Currently, more than 6,500 records of more than 3,700 proteins are included in the CDP. Comparing apoptosis and autophagy using overrepresentation analysis of GO terms, the majority of enriched processes were found in both, but also some clear differences were perceived. Furthermore, the analysis revealed differences and similarities of the proteome between autophagosomal and overall autophagy. The CDP database represents a useful tool to consolidate data from proteome analyses of programmed cell death and is available at http://celldeathproteomics.uio.no.

  2. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  3. Public variant databases: liability?

    PubMed

    Thorogood, Adrian; Cook-Deegan, Robert; Knoppers, Bartha Maria

    2017-07-01

    Public variant databases support the curation, clinical interpretation, and sharing of genomic data, thus reducing harmful errors or delays in diagnosis. As variant databases are increasingly relied on in the clinical context, there is concern that negligent variant interpretation will harm patients and attract liability. This article explores the evolving legal duties of laboratories, public variant databases, and physicians in clinical genomics and recommends a governance framework for databases to promote responsible data sharing.Genet Med advance online publication 15 December 2016.

  4. Production and distribution of scientific and technical databases - Comparison among Japan, US and Europe

    NASA Astrophysics Data System (ADS)

    Onodera, Natsuo; Mizukami, Masayuki

    This paper estimates several quantitative indice on production and distribution of scientific and technical databases based on various recent publications and attempts to compare the indice internationally. Raw data used for the estimation are brought mainly from the Database Directory (published by MITI) for database production and from some domestic and foreign study reports for database revenues. The ratio of the indice among Japan, US and Europe for usage of database is similar to those for general scientific and technical activities such as population and R&D expenditures. But Japanese contributions to production, revenue and over-countory distribution of databases are still lower than US and European countries. International comparison of relative database activities between public and private sectors is also discussed.

  5. THE ECOTOX DATABASE

    EPA Science Inventory

    The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...

  6. Integrative neuroscience: the role of a standardized database.

    PubMed

    Gordon, E; Cooper, N; Rennie, C; Hermens, D; Williams, L M

    2005-04-01

    Most brain related databases bring together specialized information, with a growing number that include neuroimaging measures. This article outlines the potential use and insights from the first entirely standardized and centralized database, which integrates information from neuroimaging measures (EEG, event related potential (ERP), structural/functional MRI), arousal (skin conductance responses (SCR)s, heart rate, respiration), neuropsychological and personality tests, genomics and demographics: The Brain Resource International Database. It comprises data from over 2000 "normative" subjects and a growing number of patients with neurological and psychiatric illnesses, acquired from over 50 laboratories (in the U.S.A, United Kingdom, Holland, South Africa, Israel and Australia), all with identical equipment and experimental procedures. Three primary goals of this database are to quantify individual differences in normative brain function, to compare an individual's performance to their database peers, and to provide a robust normative framework for clinical assessment and treatment prediction. We present three example demonstrations in relation to these goals. First, we show how consistent age differences may be quantified when large subject numbers are available, using EEG and ERP data from nearly 2000 stringently screened. normative subjects. Second, the use of a normalization technique provides a means to compare clinical subjects (50 ADHD subjects in this study) to the normative database with the effects of age and gender taken into account. Third, we show how a profile of EEG/ERP and autonomic measures potentially provides a means to predict treatment response in ADHD subjects. The example data consists of EEG under eyes open and eyes closed and ERP data for auditory oddball, working memory and Go-NoGo paradigms. Autonomic measures of skin conductance (tonic skin conductance level, SCL, and phasic skin conductance responses, SCRs) were acquired simultaneously

  7. Public variant databases: liability?

    PubMed Central

    Thorogood, Adrian; Cook-Deegan, Robert; Knoppers, Bartha Maria

    2017-01-01

    Public variant databases support the curation, clinical interpretation, and sharing of genomic data, thus reducing harmful errors or delays in diagnosis. As variant databases are increasingly relied on in the clinical context, there is concern that negligent variant interpretation will harm patients and attract liability. This article explores the evolving legal duties of laboratories, public variant databases, and physicians in clinical genomics and recommends a governance framework for databases to promote responsible data sharing. Genet Med advance online publication 15 December 2016 PMID:27977006

  8. Database for propagation models

    NASA Astrophysics Data System (ADS)

    Kantak, Anil V.

    1991-07-01

    A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.

  9. Performance of Stratified and Subgrouped Disproportionality Analyses in Spontaneous Databases.

    PubMed

    Seabroke, Suzie; Candore, Gianmario; Juhlin, Kristina; Quarcoo, Naashika; Wisniewski, Antoni; Arani, Ramin; Painter, Jeffery; Tregunno, Philip; Norén, G Niklas; Slattery, Jim

    2016-04-01

    Disproportionality analyses are used in many organisations to identify adverse drug reactions (ADRs) from spontaneous report data. Reporting patterns vary over time, with patient demographics, and between different geographical regions, and therefore subgroup analyses or adjustment by stratification may be beneficial. The objective of this study was to evaluate the performance of subgroup and stratified disproportionality analyses for a number of key covariates within spontaneous report databases of differing sizes and characteristics. Using a reference set of established ADRs, signal detection performance (sensitivity and precision) was compared for stratified, subgroup and crude (unadjusted) analyses within five spontaneous report databases (two company, one national and two international databases). Analyses were repeated for a range of covariates: age, sex, country/region of origin, calendar time period, event seriousness, vaccine/non-vaccine, reporter qualification and report source. Subgroup analyses consistently performed better than stratified analyses in all databases. Subgroup analyses also showed benefits in both sensitivity and precision over crude analyses for the larger international databases, whilst for the smaller databases a gain in precision tended to result in some loss of sensitivity. Additionally, stratified analyses did not increase sensitivity or precision beyond that associated with analytical artefacts of the analysis. The most promising subgroup covariates were age and region/country of origin, although this varied between databases. Subgroup analyses perform better than stratified analyses and should be considered over the latter in routine first-pass signal detection. Subgroup analyses are also clearly beneficial over crude analyses for larger databases, but further validation is required for smaller databases.

  10. Choosing a Database for Social Work: A Comparison of Social Work Abstracts and Social Service Abstracts

    ERIC Educational Resources Information Center

    Flatley, Robert K.; Lilla, Rick; Widner, Jack

    2007-01-01

    This study compared Social Work Abstracts and Social Services Abstracts databases in terms of indexing, journal coverage, and searches. The authors interviewed editors, analyzed journal coverage, and compared searches. It was determined that the databases complement one another more than compete. The authors conclude with some considerations.

  11. Kingfisher: a system for remote sensing image database management

    NASA Astrophysics Data System (ADS)

    Bruzzo, Michele; Giordano, Ferdinando; Dellepiane, Silvana G.

    2003-04-01

    At present retrieval methods in remote sensing image database are mainly based on spatial-temporal information. The increasing amount of images to be collected by the ground station of earth observing systems emphasizes the need for database management with intelligent data retrieval capabilities. The purpose of the proposed method is to realize a new content based retrieval system for remote sensing images database with an innovative search tool based on image similarity. This methodology is quite innovative for this application, at present many systems exist for photographic images, as for example QBIC and IKONA, but they are not able to extract and describe properly remote image content. The target database is set by an archive of images originated from an X-SAR sensor (spaceborne mission, 1994). The best content descriptors, mainly texture parameters, guarantees high retrieval performances and can be extracted without losses independently of image resolution. The latter property allows DBMS (Database Management System) to process low amount of information, as in the case of quick-look images, improving time performance and memory access without reducing retrieval accuracy. The matching technique has been designed to enable image management (database population and retrieval) independently of dimensions (width and height). Local and global content descriptors are compared, during retrieval phase, with the query image and results seem to be very encouraging.

  12. An Evaluation of Database Solutions to Spatial Object Association

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumar, V S; Kurc, T; Saltz, J

    2008-06-24

    Object association is a common problem encountered in many applications. Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two datasets based on their positions in a common spatial coordinate system--one of the datasets may correspond to a catalog of objects observed over time in a multi-dimensional domain; the other dataset may consist of objects observed in a snapshot of the domain at a time point. The use of database management systems to the solve the object association problem provides portability across different platforms and also greater flexibility. Increasingmore » dataset sizes in today's applications, however, have made object association a data/compute-intensive problem that requires targeted optimizations for efficient execution. In this work, we investigate how database-based crossmatch algorithms can be deployed on different database system architectures and evaluate the deployments to understand the impact of architectural choices on crossmatch performance and associated trade-offs. We investigate the execution of two crossmatch algorithms on (1) a parallel database system with active disk style processing capabilities, (2) a high-throughput network database (MySQL Cluster), and (3) shared-nothing databases with replication. We have conducted our study in the context of a large-scale astronomy application with real use-case scenarios.« less

  13. FishTraits Database

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2009-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.

  14. CB Database: A change blindness database for objects in natural indoor scenes.

    PubMed

    Sareen, Preeti; Ehinger, Krista A; Wolfe, Jeremy M

    2016-12-01

    Change blindness has been a topic of interest in cognitive sciences for decades. Change detection experiments are frequently used for studying various research topics such as attention and perception. However, creating change detection stimuli is tedious and there is no open repository of such stimuli using natural scenes. We introduce the Change Blindness (CB) Database with object changes in 130 colored images of natural indoor scenes. The size and eccentricity are provided for all the changes as well as reaction time data from a baseline experiment. In addition, we have two specialized satellite databases that are subsets of the 130 images. In one set, changes are seen in rooms or in mirrors in those rooms (Mirror Change Database). In the other, changes occur in a room or out a window (Window Change Database). Both the sets have controlled background, change size, and eccentricity. The CB Database is intended to provide researchers with a stimulus set of natural scenes with defined stimulus parameters that can be used for a wide range of experiments. The CB Database can be found at http://search.bwh.harvard.edu/new/CBDatabase.html .

  15. Difficulties in diagnosing Marfan syndrome using current FBN1 databases.

    PubMed

    Groth, Kristian A; Gaustadnes, Mette; Thorsen, Kasper; Østergaard, John R; Jensen, Uffe Birk; Gravholt, Claus H; Andersen, Niels H

    2016-01-01

    The diagnostic criteria of Marfan syndrome (MFS) highlight the importance of a FBN1 mutation test in diagnosing MFS. As genetic sequencing becomes better, cheaper, and more accessible, the expected increase in the number of genetic tests will become evident, resulting in numerous genetic variants that need to be evaluated for disease-causing effects based on database information. The aim of this study was to evaluate genetic variants in four databases and review the relevant literature. We assessed background data on 23 common variants registered in ESP6500 and classified as causing MFS in the Human Gene Mutation Database (HGMD). We evaluated data in four variant databases (HGMD, UMD-FBN1, ClinVar, and UniProt) according to the diagnostic criteria for MFS and compared the results with the classification of each variant in the four databases. None of the 23 variants was clearly associated with MFS, even though all classifications in the databases stated otherwise. A genetic diagnosis of MFS cannot reliably be based on current variant databases because they contain incorrectly interpreted conclusions on variants. Variants must be evaluated by time-consuming review of the background material in the databases and by combining these data with expert knowledge on MFS. This is a major problem because we expect even more genetic test results in the near future as a result of the reduced cost and process time for next-generation sequencing.Genet Med 18 1, 98-102.

  16. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

    PubMed

    Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).

  17. Virus Database and Online Inquiry System Based on Natural Vectors.

    PubMed

    Dong, Rui; Zheng, Hui; Tian, Kun; Yau, Shek-Chung; Mao, Weiguang; Yu, Wenping; Yin, Changchuan; Yu, Chenglong; He, Rong Lucy; Yang, Jie; Yau, Stephen St

    2017-01-01

    We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.

  18. Structural Ceramics Database

    National Institute of Standards and Technology Data Gateway

    SRD 30 NIST Structural Ceramics Database (Web, free access)   The NIST Structural Ceramics Database (WebSCD) provides evaluated materials property data for a wide range of advanced ceramics known variously as structural ceramics, engineering ceramics, and fine ceramics.

  19. Mission and Assets Database

    NASA Technical Reports Server (NTRS)

    Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang

    2009-01-01

    Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.

  20. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for

  1. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

    PubMed

    Rigden, Daniel J; Fernández, Xosé M

    2018-01-04

    The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Evaluation of NoSQL databases for DIRAC monitoring and beyond

    NASA Astrophysics Data System (ADS)

    Mathe, Z.; Casajus Ramo, A.; Stagni, F.; Tomassetti, L.

    2015-12-01

    Nowadays, many database systems are available but they may not be optimized for storing time series data. Monitoring DIRAC jobs would be better done using a database optimised for storing time series data. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series data is not trivial as one must take into account different aspects such as manageability, scalability and extensibility. We compared the performance of Elasticsearch, OpenTSDB (based on HBase) and InfluxDB NoSQL databases, using the same set of machines and the same data. We also evaluated the effort required for maintaining them. Using the LHCb Workload Management System (WMS), based on DIRAC as a use case we set up a new monitoring system, in parallel with the current MySQL system, and we stored the same data into the databases under test. We evaluated Grafana (for OpenTSDB) and Kibana (for ElasticSearch) metrics and graph editors for creating dashboards, in order to have a clear picture on the usability of each candidate. In this paper we present the results of this study and the performance of the selected technology. We also give an outlook of other potential applications of NoSQL databases within the DIRAC project.

  3. Surgical research using national databases

    PubMed Central

    Leland, Hyuma; Heckmann, Nathanael

    2016-01-01

    Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research. PMID:27867945

  4. Surgical research using national databases.

    PubMed

    Alluri, Ram K; Leland, Hyuma; Heckmann, Nathanael

    2016-10-01

    Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research.

  5. Historical seismometry database project: A comprehensive relational database for historical seismic records

    NASA Astrophysics Data System (ADS)

    Bono, Andrea

    2007-01-01

    The recovery and preservation of the patrimony made of the instrumental registrations regarding the historical earthquakes is with no doubt a subject of great interest. This attention, besides being purely historical, must necessarily be also scientific. In fact, the availability of a great amount of parametric information on the seismic activity in a given area is a doubtless help to the seismologic researcher's activities. In this article the project of the Sismos group of the National Institute of Geophysics and Volcanology of Rome new database is presented. In the structure of the new scheme the matured experience of five years of activity is summarized. We consider it useful for those who are approaching to "recovery and reprocess" computer based facilities. In the past years several attempts on Italian seismicity have followed each other. It has almost never been real databases. Some of them have had positive success because they were well considered and organized. In others it was limited in supplying lists of events with their relative hypocentral standards. What makes this project more interesting compared to the previous work is the completeness and the generality of the managed information. For example, it will be possible to view the hypocentral information regarding a given historical earthquake; it will be possible to research the seismograms in raster, digital or digitalized format, the information on times of arrival of the phases in the various stations, the instrumental standards and so on. The relational modern logic on which the archive is based, allows the carrying out of all these operations with little effort. The database described below will completely substitute Sismos' current data bank. Some of the organizational principles of this work are similar to those that inspire the database for the real-time monitoring of the seismicity in use in the principal offices of international research. A modern planning logic in a distinctly historical

  6. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    PubMed

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  7. Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database.

    PubMed

    Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee

    2017-07-01

    Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.

  8. CSE database: extended annotations and new recommendations for ECG software testing.

    PubMed

    Smíšek, Radovan; Maršánová, Lucie; Němcová, Andrea; Vítek, Martin; Kozumplík, Jiří; Nováková, Marie

    2017-08-01

    Nowadays, cardiovascular diseases represent the most common cause of death in western countries. Among various examination techniques, electrocardiography (ECG) is still a highly valuable tool used for the diagnosis of many cardiovascular disorders. In order to diagnose a person based on ECG, cardiologists can use automatic diagnostic algorithms. Research in this area is still necessary. In order to compare various algorithms correctly, it is necessary to test them on standard annotated databases, such as the Common Standards for Quantitative Electrocardiography (CSE) database. According to Scopus, the CSE database is the second most cited standard database. There were two main objectives in this work. First, new diagnoses were added to the CSE database, which extended its original annotations. Second, new recommendations for diagnostic software quality estimation were established. The ECG recordings were diagnosed by five new cardiologists independently, and in total, 59 different diagnoses were found. Such a large number of diagnoses is unique, even in terms of standard databases. Based on the cardiologists' diagnoses, a four-round consensus (4R consensus) was established. Such a 4R consensus means a correct final diagnosis, which should ideally be the output of any tested classification software. The accuracy of the cardiologists' diagnoses compared with the 4R consensus was the basis for the establishment of accuracy recommendations. The accuracy was determined in terms of sensitivity = 79.20-86.81%, positive predictive value = 79.10-87.11%, and the Jaccard coefficient = 72.21-81.14%, respectively. Within these ranges, the accuracy of the software is comparable with the accuracy of cardiologists. The accuracy quantification of the correct classification is unique. Diagnostic software developers can objectively evaluate the success of their algorithm and promote its further development. The annotations and recommendations proposed in this work will allow

  9. [The database server for the medical bibliography database at Charles University].

    PubMed

    Vejvalka, J; Rojíková, V; Ulrych, O; Vorísek, M

    1998-01-01

    In the medical community, bibliographic databases are widely accepted as a most important source of information both for theoretical and clinical disciplines. To improve access to medical bibliographic databases at Charles University, a database server (ERL by Silver Platter) was set up at the 2nd Faculty of Medicine in Prague. The server, accessible by Internet 24 hours/7 days, hosts now 14 years' MEDLINE and 10 years' EMBASE Paediatrics. Two different strategies are available for connecting to the server: a specialized client program that communicates over the Internet (suitable for professional searching) and a web-based access that requires no specialized software (except the WWW browser) on the client side. The server is now offered to academic community to host further databases, possibly subscribed by consortia whose individual members would not subscribe them by themselves.

  10. Modeling and Databases for Teaching Petrology

    NASA Astrophysics Data System (ADS)

    Asher, P.; Dutrow, B.

    2003-12-01

    With the widespread availability of high-speed computers with massive storage and ready transport capability of large amounts of data, computational and petrologic modeling and the use of databases provide new tools with which to teach petrology. Modeling can be used to gain insights into a system, predict system behavior, describe a system's processes, compare with a natural system or simply to be illustrative. These aspects result from data driven or empirical, analytical or numerical models or the concurrent examination of multiple lines of evidence. At the same time, use of models can enhance core foundations of the geosciences by improving critical thinking skills and by reinforcing prior knowledge gained. However, the use of modeling to teach petrology is dictated by the level of expectation we have for students and their facility with modeling approaches. For example, do we expect students to push buttons and navigate a program, understand the conceptual model and/or evaluate the results of a model. Whatever the desired level of sophistication, specific elements of design should be incorporated into a modeling exercise for effective teaching. These include, but are not limited to; use of the scientific method, use of prior knowledge, a clear statement of purpose and goals, attainable goals, a connection to the natural/actual system, a demonstration that complex heterogeneous natural systems are amenable to analyses by these techniques and, ideally, connections to other disciplines and the larger earth system. Databases offer another avenue with which to explore petrology. Large datasets are available that allow integration of multiple lines of evidence to attack a petrologic problem or understand a petrologic process. These are collected into a database that offers a tool for exploring, organizing and analyzing the data. For example, datasets may be geochemical, mineralogic, experimental and/or visual in nature, covering global, regional to local scales

  11. The Cardiac Safety Research Consortium ECG database.

    PubMed

    Kligfield, Paul; Green, Cynthia L

    2012-01-01

    The Cardiac Safety Research Consortium (CSRC) ECG database was initiated to foster research using anonymized, XML-formatted, digitized ECGs with corresponding descriptive variables from placebo- and positive-control arms of thorough QT studies submitted to the US Food and Drug Administration (FDA) by pharmaceutical sponsors. The database can be expanded to other data that are submitted directly to CSRC from other sources, and currently includes digitized ECGs from patients with genotyped varieties of congenital long-QT syndrome; this congenital long-QT database is also linked to ambulatory electrocardiograms stored in the Telemetric and Holter ECG Warehouse (THEW). Thorough QT data sets are available from CSRC for unblinded development of algorithms for analysis of repolarization and for blinded comparative testing of algorithms developed for the identification of moxifloxacin, as used as a positive control in thorough QT studies. Policies and procedures for access to these data sets are available from CSRC, which has developed tools for statistical analysis of blinded new algorithm performance. A recently approved CSRC project will create a data set for blinded analysis of automated ECG interval measurements, whose initial focus will include comparison of four of the major manufacturers of automated electrocardiographs in the United States. CSRC welcomes application for use of the ECG database for clinical investigation. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. A Toolkit for Active Object-Oriented Databases with Application to Interoperability

    NASA Technical Reports Server (NTRS)

    King, Roger

    1996-01-01

    In our original proposal we stated that our research would 'develop a novel technology that provides a foundation for collaborative information processing.' The essential ingredient of this technology is the notion of 'deltas,' which are first-class values representing collections of proposed updates to a database. The Heraclitus framework provides a variety of algebraic operators for building up, combining, inspecting, and comparing deltas. Deltas can be directly applied to the database to yield a new state, or used 'hypothetically' in queries against the state that would arise if the delta were applied. The central point here is that the step of elevating deltas to 'first-class' citizens in database programming languages will yield tremendous leverage on the problem of supporting updates in collaborative information processing. In short, our original intention was to develop the theoretical and practical foundation for a technology based on deltas in an object-oriented database context, develop a toolkit for active object-oriented databases, and apply this toward collaborative information processing.

  13. A Toolkit for Active Object-Oriented Databases with Application to Interoperability

    NASA Technical Reports Server (NTRS)

    King, Roger

    1996-01-01

    In our original proposal we stated that our research would 'develop a novel technology that provides a foundation for collaborative information processing.' The essential ingredient of this technology is the notion of 'deltas,' which are first-class values representing collections of proposed updates to a database. The Heraclitus framework provides a variety of algebraic operators for building up, combining, inspecting, and comparing deltas. Deltas can be directly applied to the database to yield a new state, or used 'hypothetically' in queries against the state that would arise if the delta were applied. The central point here is that the step of elevating deltas to 'first-class' citizens in database programming languages will yield tremendous leverage on the problem of supporting updates in collaborative information processing. In short, our original intention was to develop the theoretical and practical foundation for a technology based on deltas in an object- oriented database context, develop a toolkit for active object-oriented databases, and apply this toward collaborative information processing.

  14. Heterogeneous database integration in biomedicine.

    PubMed

    Sujansky, W

    2001-08-01

    The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.

  15. The database of chromosome imbalance regions and genes resided in lung cancer from Asian and Caucasian identified by array-comparative genomic hybridization

    PubMed Central

    2012-01-01

    Background Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Methods Array-comparative genomic hybridization (array-CGH) was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR), chromogenic in situ hybridization (CISH), reverse transcriptase-qPCR (RT-qPCR), and immunohistochemistry (IHC) in more patients. Results We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1) functioning in Rho activity control, FRAT2 (10q24.1) involved in Wnt signaling, PAFAH1B1 (17p13.3) functioning in motility control, and ZNF322A (6p22.1) involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (P<0.001~P=0.06). In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of PAFAH1B1

  16. Development of an electronic database for Acute Pain Service outcomes

    PubMed Central

    Love, Brandy L; Jensen, Louise A; Schopflocher, Donald; Tsui, Ban CH

    2012-01-01

    BACKGROUND: Quality assurance is increasingly important in the current health care climate. An electronic database can be used for tracking patient information and as a research tool to provide quality assurance for patient care. OBJECTIVE: An electronic database was developed for the Acute Pain Service, University of Alberta Hospital (Edmonton, Alberta) to record patient characteristics, identify at-risk populations, compare treatment efficacies and guide practice decisions. METHOD: Steps in the database development involved identifying the goals for use, relevant variables to include, and a plan for data collection, entry and analysis. Protocols were also created for data cleaning quality control. The database was evaluated with a pilot test using existing data to assess data collection burden, accuracy and functionality of the database. RESULTS: A literature review resulted in an evidence-based list of demographic, clinical and pain management outcome variables to include. Time to assess patients and collect the data was 20 min to 30 min per patient. Limitations were primarily software related, although initial data collection completion was only 65% and accuracy of data entry was 96%. CONCLUSIONS: The electronic database was found to be relevant and functional for the identified goals of data storage and research. PMID:22518364

  17. Constructing a Graph Database for Semantic Literature-Based Discovery.

    PubMed

    Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C

    2015-01-01

    Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

  18. A World Wide Web (WWW) server database engine for an organelle database, MitoDat.

    PubMed

    Lemkin, P F; Chipperfield, M; Merril, C; Zullo, S

    1996-03-01

    We describe a simple database search engine "dbEngine" which may be used to quickly create a searchable database on a World Wide Web (WWW) server. Data may be prepared from spreadsheet programs (such as Excel, etc.) or from tables exported from relationship database systems. This Common Gateway Interface (CGI-BIN) program is used with a WWW server such as available commercially, or from National Center for Supercomputer Algorithms (NCSA) or CERN. Its capabilities include: (i) searching records by combinations of terms connected with ANDs or ORs; (ii) returning search results as hypertext links to other WWW database servers; (iii) mapping lists of literature reference identifiers to the full references; (iv) creating bidirectional hypertext links between pictures and the database. DbEngine has been used to support the MitoDat database (Mendelian and non-Mendelian inheritance associated with the Mitochondrion) on the WWW.

  19. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies

    PubMed Central

    Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431

  20. Morchella MLST database

    USDA-ARS?s Scientific Manuscript database

    Welcome to the Morchella MLST database. This dedicated database was set up at the CBS-KNAW Biodiversity Center by Vincent Robert in February 2012, using BioloMICS software (Robert et al., 2011), to facilitate DNA sequence-based identifications of Morchella species via the Internet. The current datab...

  1. Dictionary as Database.

    ERIC Educational Resources Information Center

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  2. Cost considerations in database selection - A comparison of DIALOG and ESA/IRS

    NASA Technical Reports Server (NTRS)

    Jack, R. F.

    1984-01-01

    It is pointed out that there are many factors which affect the decision-making process in determining which databases should be selected for conducting the online search on a given topic. In many cases, however, the major consideration will be related to cost. The present investigation is concerned with a comparison of the costs involved in making use of DIALOG and the European Space Agency's Information Retrieval Service (ESA/IRS). The two services are very comparable in many respects. Attention is given to pricing structure, telecommunications, the number of databases, prints, time requirements, a table listing online costs for DIALOG and ESA/IRS, and differences in mounting databases. It is found that ESA/IRS is competitively priced when compared to DIALOG, and, despite occasionally higher telecommunications costs, may be even more economical to use in some cases.

  3. Freshwater Biological Traits Database (Final Report)

    EPA Science Inventory

    EPA announced the release of the final report, Freshwater Biological Traits Database. This report discusses the development of a database of freshwater biological traits. The database combines several existing traits databases into an online format. The database is also...

  4. Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2017-08-18

    The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.

  5. [Comparison between administrative and clinical databases in the evaluation of cardiac surgery performance].

    PubMed

    Rosato, Stefano; D'Errigo, Paola; Badoni, Gabriella; Fusco, Danilo; Perucci, Carlo A; Seccareccia, Fulvia

    2008-08-01

    The availability of two contemporary sources of information about coronary artery bypass graft (CABG) interventions, allowed 1) to verify the feasibility of performing outcome evaluation studies using administrative data sources, and 2) to compare hospital performance obtainable using the CABG Project clinical database with hospital performance derived from the use of current administrative data. Interventions recorded in the CABG Project were linked to the hospital discharge record (HDR) administrative database. Only the linked records were considered for subsequent analyses (46% of the total CABG Project). A new selected population "clinical card-HDR" was then defined. Two independent risk-adjustment models were applied, each of them using information derived from one of the two different sources. Then, HDR information was supplemented with some patient preoperative conditions from the CABG clinical database. The two models were compared in terms of their adaptability to data. Hospital performances identified by the two different models and significantly different from the mean was compared. In only 4 of the 13 hospitals considered for analysis, the results obtained using the HDR model did not completely overlap with those obtained by the CABG model. When comparing statistical parameters of the HDR model and the HDR model + patient preoperative conditions, the latter showed the best adaptability to data. In this "clinical card-HDR" population, hospital performance assessment obtained using information from the clinical database is similar to that derived from the use of current administrative data. However, when risk-adjustment models built on administrative databases are supplemented with a few clinical variables, their statistical parameters improve and hospital performance assessment becomes more accurate.

  6. Intelligent communication assistant for databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobson, G.; Shaked, V.; Rowley, S.

    1983-01-01

    An intelligent communication assistant for databases, called FRED (front end for databases) is explored. FRED is designed to facilitate access to database systems by users of varying levels of experience. FRED is a second generation of natural language front-ends for databases and intends to solve two critical interface problems existing between end-users and databases: connectivity and communication problems. The authors report their experiences in developing software for natural language query processing, dialog control, and knowledge representation, as well as the direction of future work. 10 references.

  7. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  8. The Halophile protein database.

    PubMed

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.

  9. Impact of Genomics Platform and Statistical Filtering on Transcriptional Benchmark Doses (BMD) and Multiple Approaches for Selection of Chemical Point of Departure (PoD)

    PubMed Central

    Webster, A. Francina; Chepelev, Nikolai; Gagné, Rémi; Kuo, Byron; Recio, Leslie; Williams, Andrew; Yauk, Carole L.

    2015-01-01

    Many regulatory agencies are exploring ways to integrate toxicogenomic data into their chemical risk assessments. The major challenge lies in determining how to distill the complex data produced by high-content, multi-dose gene expression studies into quantitative information. It has been proposed that benchmark dose (BMD) values derived from toxicogenomics data be used as point of departure (PoD) values in chemical risk assessments. However, there is limited information regarding which genomics platforms are most suitable and how to select appropriate PoD values. In this study, we compared BMD values modeled from RNA sequencing-, microarray-, and qPCR-derived gene expression data from a single study, and explored multiple approaches for selecting a single PoD from these data. The strategies evaluated include several that do not require prior mechanistic knowledge of the compound for selection of the PoD, thus providing approaches for assessing data-poor chemicals. We used RNA extracted from the livers of female mice exposed to non-carcinogenic (0, 2 mg/kg/day, mkd) and carcinogenic (4, 8 mkd) doses of furan for 21 days. We show that transcriptional BMD values were consistent across technologies and highly predictive of the two-year cancer bioassay-based PoD. We also demonstrate that filtering data based on statistically significant changes in gene expression prior to BMD modeling creates more conservative BMD values. Taken together, this case study on mice exposed to furan demonstrates that high-content toxicogenomics studies produce robust data for BMD modelling that are minimally affected by inter-technology variability and highly predictive of cancer-based PoD doses. PMID:26313361

  10. A new relational database structure and online interface for the HITRAN database

    NASA Astrophysics Data System (ADS)

    Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

    2013-11-01

    A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.

  11. Materials Databases Infrastructure Constructed by First Principles Calculations: A Review

    DOE PAGES

    Lin, Lianshan

    2015-10-13

    The First Principles calculations, especially the calculation based on High-Throughput Density Functional Theory, have been widely accepted as the major tools in atom scale materials design. The emerging super computers, along with the powerful First Principles calculations, have accumulated hundreds of thousands of crystal and compound records. The exponential growing of computational materials information urges the development of the materials databases, which not only provide unlimited storage for the daily increasing data, but still keep the efficiency in data storage, management, query, presentation and manipulation. This review covers the most cutting edge materials databases in materials design, and their hotmore » applications such as in fuel cells. By comparing the advantages and drawbacks of these high-throughput First Principles materials databases, the optimized computational framework can be identified to fit the needs of fuel cell applications. The further development of high-throughput DFT materials database, which in essence accelerates the materials innovation, is discussed in the summary as well.« less

  12. National Transportation Atlas Databases : 1999

    DOT National Transportation Integrated Search

    1999-01-01

    The National Transportation Atlas Databases -- 1999 (NTAD99) is a set of national : geographic databases of transportation facilities. These databases include geospatial : information for transportation modal networks and intermodal terminals, and re...

  13. National Transportation Atlas Databases : 2001

    DOT National Transportation Integrated Search

    2001-01-01

    The National Transportation Atlas Databases-2001 (NTAD-2001) is a set of national geographic databases of transportation facilities. These databases include geospatial information for transportation modal networks and intermodal terminals and related...

  14. National Transportation Atlas Databases : 1996

    DOT National Transportation Integrated Search

    1996-01-01

    The National Transportation Atlas Databases -- 1996 (NTAD96) is a set of national : geographic databases of transportation facilities. These databases include geospatial : information for transportation modal networks and intermodal terminals, and re...

  15. National Transportation Atlas Databases : 2000

    DOT National Transportation Integrated Search

    2000-01-01

    The National Transportation Atlas Databases-2000 (NTAD-2000) is a set of national geographic databases of transportation facilities. These databases include geospatial information for transportation modal networks and intermodal terminals and related...

  16. National Transportation Atlas Databases : 1997

    DOT National Transportation Integrated Search

    1997-01-01

    The National Transportation Atlas Databases -- 1997 (NTAD97) is a set of national : geographic databases of transportation facilities. These databases include geospatial : information for transportation modal networks and intermodal terminals, and re...

  17. FRED, a Front End for Databases.

    ERIC Educational Resources Information Center

    Crystal, Maurice I.; Jakobson, Gabriel E.

    1982-01-01

    FRED (a Front End for Databases) was conceived to alleviate data access difficulties posed by the heterogeneous nature of online databases. A hardware/software layer interposed between users and databases, it consists of three subsystems: user-interface, database-interface, and knowledge base. Architectural alternatives for this database machine…

  18. Osteoporosis therapies: evidence from health-care databases and observational population studies.

    PubMed

    Silverman, Stuart L

    2010-11-01

    Osteoporosis is a well-recognized disease with severe consequences if left untreated. Randomized controlled trials are the most rigorous method for determining the efficacy and safety of therapies. Nevertheless, randomized controlled trials underrepresent the real-world patient population and are costly in both time and money. Modern technology has enabled researchers to use information gathered from large health-care or medical-claims databases to assess the practical utilization of available therapies in appropriate patients. Observational database studies lack randomization but, if carefully designed and successfully completed, can provide valuable information that complements results obtained from randomized controlled trials and extends our knowledge to real-world clinical patients. Randomized controlled trials comparing fracture outcomes among osteoporosis therapies are difficult to perform. In this regard, large observational database studies could be useful in identifying clinically important differences among therapeutic options. Database studies can also provide important information with regard to osteoporosis prevalence, health economics, and compliance and persistence with treatment. This article describes the strengths and limitations of both randomized controlled trials and observational database studies, discusses considerations for observational study design, and reviews a wealth of information generated by database studies in the field of osteoporosis.

  19. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

    PubMed

    Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

    2009-01-01

    Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.

  20. Respiratory cancer database: An open access database of respiratory cancer gene and miRNA.

    PubMed

    Choubey, Jyotsna; Choudhari, Jyoti Kant; Patel, Ashish; Verma, Mukesh Kumar

    2017-01-01

    Respiratory cancer database (RespCanDB) is a genomic and proteomic database of cancer of respiratory organ. It also includes the information of medicinal plants used for the treatment of various respiratory cancers with structure of its active constituents as well as pharmacological and chemical information of drug associated with various respiratory cancers. Data in RespCanDB has been manually collected from published research article and from other databases. Data has been integrated using MySQL an object-relational database management system. MySQL manages all data in the back-end and provides commands to retrieve and store the data into the database. The web interface of database has been built in ASP. RespCanDB is expected to contribute to the understanding of scientific community regarding respiratory cancer biology as well as developments of new way of diagnosing and treating respiratory cancer. Currently, the database consist the oncogenomic information of lung cancer, laryngeal cancer, and nasopharyngeal cancer. Data for other cancers, such as oral and tracheal cancers, will be added in the near future. The URL of RespCanDB is http://ridb.subdic-bioinformatics-nitrr.in/.

  1. NASA Records Database

    NASA Technical Reports Server (NTRS)

    Callac, Christopher; Lunsford, Michelle

    2005-01-01

    The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.

  2. The Amma-Sat Database

    NASA Astrophysics Data System (ADS)

    Ramage, K.; Desbois, M.; Eymard, L.

    2004-12-01

    The African Monsoon Multidisciplinary Analysis project is a French initiative, which aims at identifying and analysing in details the multidisciplinary and multi-scales processes that lead to a better understanding of the physical mechanisms linked to the African Monsoon. The main components of the African Monsoon are: Atmospheric Dynamics, the Continental Water Cycle, Atmospheric Chemistry, Oceanic and Continental Surface Conditions. Satellites contribute to various objectives of the project both for process analysis and for large scale-long term studies: some series of satellites (METEOSAT, NOAA,.) have been flown for more than 20 years, ensuring a good quality monitoring of some of the West African atmosphere and surface characteristics. Moreover, several recent missions, and several projects will strongly improve and complement this survey. The AMMA project offers an opportunity to develop the exploitation of satellite data and to make collaboration between specialist and non-specialist users. In this purpose databases are being developed to collect all past and future satellite data related to the African Monsoon. It will then be possible to compare different types of data from different resolution, to validate satellite data with in situ measurements or numerical simulations. AMMA-SAT database main goal is to offer an easy access to satellite data to the AMMA scientific community. The database contains geophysical products estimated from operational or research algorithms and covering the different components of the AMMA project. Nevertheless, the choice has been made to group data within pertinent scales rather than within their thematic. In this purpose, five regions of interest where defined to extract the data: An area covering Tropical Atlantic and Africa for large scale studies, an area covering West Africa for mesoscale studies and three local areas surrounding sites of in situ observations. Within each of these regions satellite data are projected on

  3. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  4. University Real Estate Development Database: A Database-Driven Internet Research Tool

    ERIC Educational Resources Information Center

    Wiewel, Wim; Kunst, Kara

    2008-01-01

    The University Real Estate Development Database is an Internet resource developed by the University of Baltimore for the Lincoln Institute of Land Policy, containing over six hundred cases of university expansion outside of traditional campus boundaries. The University Real Estate Development database is a searchable collection of real estate…

  5. A case study for a digital seabed database: Bohai Sea engineering geology database

    NASA Astrophysics Data System (ADS)

    Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang

    2006-07-01

    This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.

  6. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.

  7. HIV Structural Database

    National Institute of Standards and Technology Data Gateway

    SRD 102 HIV Structural Database (Web, free access)   The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.

  8. Gene and protein nomenclature in public databases

    PubMed Central

    Fundel, Katrin; Zimmer, Ralf

    2006-01-01

    Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries

  9. A mobile trauma database with charge capture.

    PubMed

    Moulton, Steve; Myung, Dan; Chary, Aron; Chen, Joshua; Agarwal, Suresh; Emhoff, Tim; Burke, Peter; Hirsch, Erwin

    2005-11-01

    Charge capture plays an important role in every surgical practice. We have developed and merged a custom mobile database (DB) system with our trauma registry (TRACS), to better understand our billing methods, revenue generators, and areas for improved revenue capture. The mobile database runs on handheld devices using the Windows Compact Edition platform. The front end was written in C# and the back end is SQL. The mobile database operates as a thick client; it includes active and inactive patient lists, billing screens, hot pick lists, and Current Procedural Terminology and International Classification of Diseases, Ninth Revision code sets. Microsoft Information Internet Server provides secure data transaction services between the back ends stored on each device. Traditional, hand written billing information for three of five adult trauma surgeons was averaged over a 5-month period. Electronic billing information was then collected over a 3-month period using handheld devices and the subject software application. One surgeon used the software for all 3 months, and two surgeons used it for the latter 2 months of the electronic data collection period. This electronic billing information was combined with TRACS data to determine the clinical characteristics of the trauma patients who were and were not captured using the mobile database. Total charges increased by 135%, 148%, and 228% for each of the three trauma surgeons who used the mobile DB application. The majority of additional charges were for evaluation and management services. Patients who were captured and billed at the point of care using the mobile DB had higher Injury Severity Scores, were more likely to undergo an operative procedure, and had longer lengths of stay compared with those who were not captured. Total charges more than doubled using a mobile database to bill at the point of care. A subsequent comparison of TRACS data with billing information revealed a large amount of uncaptured patient

  10. Key features for ATA / ATR database design in missile systems

    NASA Astrophysics Data System (ADS)

    Özertem, Kemal Arda

    2017-05-01

    Automatic target acquisition (ATA) and automatic target recognition (ATR) are two vital tasks for missile systems, and having a robust detection and recognition algorithm is crucial for overall system performance. In order to have a robust target detection and recognition algorithm, an extensive image database is required. Automatic target recognition algorithms use the database of images in training and testing steps of algorithm. This directly affects the recognition performance, since the training accuracy is driven by the quality of the image database. In addition, the performance of an automatic target detection algorithm can be measured effectively by using an image database. There are two main ways for designing an ATA / ATR database. The first and easy way is by using a scene generator. A scene generator can model the objects by considering its material information, the atmospheric conditions, detector type and the territory. Designing image database by using a scene generator is inexpensive and it allows creating many different scenarios quickly and easily. However the major drawback of using a scene generator is its low fidelity, since the images are created virtually. The second and difficult way is designing it using real-world images. Designing image database with real-world images is a lot more costly and time consuming; however it offers high fidelity, which is critical for missile algorithms. In this paper, critical concepts in ATA / ATR database design with real-world images are discussed. Each concept is discussed in the perspective of ATA and ATR separately. For the implementation stage, some possible solutions and trade-offs for creating the database are proposed, and all proposed approaches are compared to each other with regards to their pros and cons.

  11. Hydrogen Leak Detection Sensor Database

    NASA Technical Reports Server (NTRS)

    Baker, Barton D.

    2010-01-01

    This slide presentation reviews the characteristics of the Hydrogen Sensor database. The database is the result of NASA's continuing interest in and improvement of its ability to detect and assess gas leaks in space applications. The database specifics and a snapshot of an entry in the database are reviewed. Attempts were made to determine the applicability of each of the 65 sensors for ground and/or vehicle use.

  12. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. © The Author(s) 2015. Published by Oxford University Press.

  13. Alignment of high-throughput sequencing data inside in-memory databases.

    PubMed

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  14. Landscape features, standards, and semantics in U.S. national topographic mapping databases

    USGS Publications Warehouse

    Varanka, Dalia

    2009-01-01

    The objective of this paper is to examine the contrast between local, field-surveyed topographical representation and feature representation in digital, centralized databases and to clarify their ontological implications. The semantics of these two approaches are contrasted by examining the categorization of features by subject domains inherent to national topographic mapping. When comparing five USGS topographic mapping domain and feature lists, results indicate that multiple semantic meanings and ontology rules were applied to the initial digital database, but were lost as databases became more centralized at national scales, and common semantics were replaced by technological terms.

  15. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    PubMed

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. ITS-90 Thermocouple Database

    National Institute of Standards and Technology Data Gateway

    SRD 60 NIST ITS-90 Thermocouple Database (Web, free access)   Web version of Standard Reference Database 60 and NIST Monograph 175. The database gives temperature -- electromotive force (emf) reference functions and tables for the letter-designated thermocouple types B, E, J, K, N, R, S and T. These reference functions have been adopted as standards by the American Society for Testing and Materials (ASTM) and the International Electrotechnical Commission (IEC).

  17. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  18. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  19. Electron Effective-Attenuation-Length Database

    National Institute of Standards and Technology Data Gateway

    SRD 82 NIST Electron Effective-Attenuation-Length Database (PC database, no charge)   This database provides values of electron effective attenuation lengths (EALs) in solid elements and compounds at selected electron energies between 50 eV and 2,000 eV. The database was designed mainly to provide EALs (to account for effects of elastic-eletron scattering) for applications in surface analysis by Auger-electron spectroscopy (AES) and X-ray photoelectron spectroscopy (XPS).

  20. Systems toxicology of chemically induced liver and kidney injuries: histopathology‐associated gene co‐expression modules

    PubMed Central

    Te, Jerez A.; AbdulHameed, Mohamed Diwan M.

    2016-01-01

    Abstract Organ injuries caused by environmental chemical exposures or use of pharmaceutical drugs pose a serious health risk that may be difficult to assess because of a lack of non‐invasive diagnostic tests. Mapping chemical injuries to organ‐specific histopathology outcomes via biomarkers will provide a foundation for designing precise and robust diagnostic tests. We identified co‐expressed genes (modules) specific to injury endpoints using the Open Toxicogenomics Project‐Genomics Assisted Toxicity Evaluation System (TG‐GATEs) – a toxicogenomics database containing organ‐specific gene expression data matched to dose‐ and time‐dependent chemical exposures and adverse histopathology assessments in Sprague–Dawley rats. We proposed a protocol for selecting gene modules associated with chemical‐induced injuries that classify 11 liver and eight kidney histopathology endpoints based on dose‐dependent activation of the identified modules. We showed that the activation of the modules for a particular chemical exposure condition, i.e., chemical‐time‐dose combination, correlated with the severity of histopathological damage in a dose‐dependent manner. Furthermore, the modules could distinguish different types of injuries caused by chemical exposures as well as determine whether the injury module activation was specific to the tissue of origin (liver and kidney). The generated modules provide a link between toxic chemical exposures, different molecular initiating events among underlying molecular pathways and resultant organ damage. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Journal of Applied Toxicology published by John Wiley & Sons, Ltd. PMID:26725466

  1. BioImaging Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    David Nix, Lisa Simirenko

    2006-10-25

    The Biolmaging Database (BID) is a relational database developed to store the data and meta-data for the 3D gene expression in early Drosophila embryo development on a cellular level. The schema was written to be used with the MySQL DBMS but with minor modifications can be used on any SQL compliant relational DBMS.

  2. Visualization and manipulating the image of a formal data structure (FDS)-based database

    NASA Astrophysics Data System (ADS)

    Verdiesen, Franc; de Hoop, Sylvia; Molenaar, Martien

    1994-08-01

    A vector map is a terrain representation with a vector-structured geometry. Molenaar formulated an object-oriented formal data structure for 3D single valued vector maps. This FDS is implemented in a database (Oracle). In this study we describe a methodology for visualizing a FDS-based database and manipulating the image. A data set retrieved by querying the database is converted into an import file for a drawing application. An objective of this study is that an end-user can alter and add terrain objects in the image. The drawing application creates an export file, that is compared with the import file. Differences between these files result in updating the database which involves checks on consistency. In this study Autocad is used for visualizing and manipulating the image of the data set. A computer program has been written for the data exchange and conversion between Oracle and Autocad. The data structure of the FDS is compared to the data structure of Autocad and the data of the FDS is converted into the structure of Autocad equal to the FDS.

  3. Microbial Properties Database Editor Tutorial

    EPA Science Inventory

    A Microbial Properties Database Editor (MPDBE) has been developed to help consolidate microbial-relevant data to populate a microbial database and support a database editor by which an authorized user can modify physico-microbial properties related to microbial indicators and pat...

  4. Smart Location Database - Service

    EPA Pesticide Factsheets

    The Smart Location Database (SLD) summarizes over 80 demographic, built environment, transit service, and destination accessibility attributes for every census block group in the United States. Future updates to the SLD will include additional attributes which summarize the relative location efficiency of a block group when compared to other block groups within the same metropolitan region. EPA also plans to periodically update attributes and add new attributes to reflect latest available data. A log of SLD updates is included in the SLD User Guide. See the user guide for a full description of data sources, data currency, and known limitations: https://edg.epa.gov/data/Public/OP/SLD/SLD_userguide.pdf

  5. Smart Location Database - Download

    EPA Pesticide Factsheets

    The Smart Location Database (SLD) summarizes over 80 demographic, built environment, transit service, and destination accessibility attributes for every census block group in the United States. Future updates to the SLD will include additional attributes which summarize the relative location efficiency of a block group when compared to other block groups within the same metropolitan region. EPA also plans to periodically update attributes and add new attributes to reflect latest available data. A log of SLD updates is included in the SLD User Guide. See the user guide for a full description of data sources, data currency, and known limitations: https://edg.epa.gov/data/Public/OP/SLD/SLD_userguide.pdf

  6. Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

    PubMed

    Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem

    2016-01-01

    Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.

  7. [Benefits of large healthcare databases for drug risk research].

    PubMed

    Garbe, Edeltraut; Pigeot, Iris

    2015-08-01

    Large electronic healthcare databases have become an important worldwide data resource for drug safety research after approval. Signal generation methods and drug safety studies based on these data facilitate the prospective monitoring of drug safety after approval, as has been recently required by EU law and the German Medicines Act. Despite its large size, a single healthcare database may include insufficient patients for the study of a very small number of drug-exposed patients or the investigation of very rare drug risks. For that reason, in the United States, efforts have been made to work on models that provide the linkage of data from different electronic healthcare databases for monitoring the safety of medicines after authorization in (i) the Sentinel Initiative and (ii) the Observational Medical Outcomes Partnership (OMOP). In July 2014, the pilot project Mini-Sentinel included a total of 178 million people from 18 different US databases. The merging of the data is based on a distributed data network with a common data model. In the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCEPP) there has been no comparable merging of data from different databases; however, first experiences have been gained in various EU drug safety projects. In Germany, the data of the statutory health insurance providers constitute the most important resource for establishing a large healthcare database. Their use for this purpose has so far been severely restricted by the Code of Social Law (Section 75, Book 10). Therefore, a reform of this section is absolutely necessary.

  8. Nuclear Science References Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pritychenko, B., E-mail: pritychenko@bnl.gov; Běták, E.; Singh, B.

    2014-06-15

    The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center (http://www.nndc.bnl.gov/nsr) and the International Atomic Energymore » Agency (http://www-nds.iaea.org/nsr)« less

  9. Microbial properties database editor tutorial

    USDA-ARS?s Scientific Manuscript database

    A Microbial Properties Database Editor (MPDBE) has been developed to help consolidate microbialrelevant data to populate a microbial database and support a database editor by which an authorized user can modify physico-microbial properties related to microbial indicators and pathogens. Physical prop...

  10. bioDBnet - Biological Database Network

    Cancer.gov

    bioDBnet is a comprehensive resource of most of the biological databases available from different sites like NCBI, Uniprot, EMBL, Ensembl, Affymetrix. It provides a queryable interface to all the databases available, converts identifiers from one database into another and generates comprehensive reports.

  11. Mugshot Identification Database (MID)

    National Institute of Standards and Technology Data Gateway

    NIST Mugshot Identification Database (MID) (Web, free access)   NIST Special Database 18 is being distributed for use in development and testing of automated mugshot identification systems. The database consists of three CD-ROMs, containing a total of 3248 images of variable size using lossless compression. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.

  12. Analysis of a virtual memory model for maintaining database views

    NASA Technical Reports Server (NTRS)

    Kinsley, Kathryn C.; Hughes, Charles E.

    1992-01-01

    This paper presents an analytical model for predicting the performance of a new support strategy for database views. This strategy, called the virtual method, is compared with traditional methods for supporting views. The analytical model's predictions of improved performance by the virtual method are then validated by comparing these results with those achieved in an experimental implementation.

  13. Signals of bleeding among direct-acting oral anticoagulant users compared to those among warfarin users: analyses of the post-marketing FDA Adverse Event Reporting System (FAERS) database, 2010-2015.

    PubMed

    Alshammari, Thamir M; Ata, Sondus I; Mahmoud, Mansour Adam; Alhawassi, Tariq M; Aljadhey, Hisham S

    2018-01-01

    To analyze and compare the signals of bleeding from the use of direct-acting oral anticoagulants (DOACs) in the US Food and Drug Administration Adverse Event Reporting System (FAERS) database over 5 years. Reports of bleeding and of events with related terms submitted to the FAERS between October 2010 and September 2015 were retrieved and then analyzed using the reporting odds ratio (ROR). The signals of bleeding associated with DOAC use were compared with the signals of bleeding associated with warfarin use utilizing the FAERS databases. A total of 1,518 reports linked dabigatran to bleeding, accounting for 2.7% of all dabigatran-related reports, whereas 93 reports linked rivaroxaban to bleeding, which accounted for 4.4% of all rivaroxaban-related reports. The concurrent proportion of bleeding-related reports for warfarin was 3.6%, with a total of 654 reports. The association of bleeding and of related terms with the use of all three medications was significant, albeit with different degrees of association. The ROR was 12.30 (95% confidence interval [CI] 11.65-12.97) for dabigatran, 15.61 (95% CI 14.42-16.90) for warfarin, and 18.86 (95% CI 15.31-23.23) for rivaroxaban. The signals of bleeding varied among the DOACs, and the bleeding signal was higher for rivaroxaban and lower for dabigatran compared to that for warfarin.

  14. Doors for memory: A searchable database.

    PubMed

    Baddeley, Alan D; Hitch, Graham J; Quinlan, Philip T; Bowes, Lindsey; Stone, Rob

    2016-11-01

    The study of human long-term memory has for over 50 years been dominated by research on words. This is partly due to lack of suitable nonverbal materials. Experience in developing a clinical test suggested that door scenes can provide an ecologically relevant and sensitive alternative to the faces and geometrical figures traditionally used to study visual memory. In pursuing this line of research, we have accumulated over 2000 door scenes providing a database that is categorized on a range of variables including building type, colour, age, condition, glazing, and a range of other physical characteristics. We describe an illustrative study of recognition memory for 100 doors tested by yes/no, two-alternative, or four-alternative forced-choice paradigms. These stimuli, together with the full categorized database, are available through a dedicated website. We suggest that door scenes provide an ecologically relevant and participant-friendly source of material for studying the comparatively neglected field of visual long-term memory.

  15. Shuttle Hypervelocity Impact Database

    NASA Technical Reports Server (NTRS)

    Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.

    2011-01-01

    With three missions outstanding, the Shuttle Hypervelocity Impact Database has nearly 3000 entries. The data is divided into tables for crew module windows, payload bay door radiators and thermal protection system regions, with window impacts compromising just over half the records. In general, the database provides dimensions of hypervelocity impact damage, a component level location (i.e., window number or radiator panel number) and the orbiter mission when the impact occurred. Additional detail on the type of particle that produced the damage site is provided when sampling data and definitive analysis results are available. Details and insights on the contents of the database including examples of descriptive statistics will be provided. Post flight impact damage inspection and sampling techniques that were employed during the different observation campaigns will also be discussed. Potential enhancements to the database structure and availability of the data for other researchers will be addressed in the Future Work section. A related database of returned surfaces from the International Space Station will also be introduced.

  16. The Génolevures database.

    PubMed

    Martin, Tiphaine; Sherman, David J; Durrens, Pascal

    2011-01-01

    The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  17. Environment/Health/Safety (EHS): Databases

    Science.gov Websites

    Hazard Documents Database Biosafety Authorization System CATS (Corrective Action Tracking System) (for findings 12/2005 to present) Chemical Management System Electrical Safety Ergonomics Database (for new Learned / Best Practices REMS - Radiation Exposure Monitoring System SJHA Database - Subcontractor Job

  18. WEB-BASED DATABASE ON RENEWAL TECHNOLOGIES ...

    EPA Pesticide Factsheets

    As U.S. utilities continue to shore up their aging infrastructure, renewal needs now represent over 43% of annual expenditures compared to new construction for drinking water distribution and wastewater collection systems (Underground Construction [UC], 2016). An increased understanding of renewal options will ultimately assist drinking water utilities in reducing water loss and help wastewater utilities to address infiltration and inflow issues in a cost-effective manner. It will also help to extend the service lives of both drinking water and wastewater mains. This research effort involved collecting case studies on the use of various trenchless pipeline renewal methods and providing the information in an online searchable database. The overall objective was to further support technology transfer and information sharing regarding emerging and innovative renewal technologies for water and wastewater mains. The result of this research is a Web-based, searchable database that utility personnel can use to obtain technology performance and cost data, as well as case study references. The renewal case studies include: technologies used; the conditions under which the technology was implemented; costs; lessons learned; and utility contact information. The online database also features a data mining tool for automated review of the technologies selected and cost data. Based on a review of the case study results and industry data, several findings are presented on tren

  19. Documentation of the U.S. Geological Survey Stress and Sediment Mobility Database

    USGS Publications Warehouse

    Dalyander, P. Soupy; Butman, Bradford; Sherwood, Christopher R.; Signell, Richard P.

    2012-01-01

    The U.S. Geological Survey Sea Floor Stress and Sediment Mobility Database contains estimates of bottom stress and sediment mobility for the U.S. continental shelf. This U.S. Geological Survey database provides information that is needed to characterize sea floor ecosystems and evaluate areas for human use. The estimates contained in the database are designed to spatially and seasonally resolve the general characteristics of bottom stress over the U.S. continental shelf and to estimate sea floor mobility by comparing critical stress thresholds based on observed sediment texture data to the modeled stress. This report describes the methods used to make the bottom stress and mobility estimates, statistics used to characterize stress and mobility, data validation procedures, and the metadata for each dataset and provides information on how to access the database online.

  20. Should we search Chinese biomedical databases when performing systematic reviews?

    PubMed

    Cohen, Jérémie F; Korevaar, Daniël A; Wang, Junfeng; Spijker, René; Bossuyt, Patrick M

    2015-03-06

    Chinese biomedical databases contain a large number of publications available to systematic reviewers, but it is unclear whether they are used for synthesizing the available evidence. We report a case of two systematic reviews on the accuracy of anti-cyclic citrullinated peptide for diagnosing rheumatoid arthritis. In one of these, the authors did not search Chinese databases; in the other, they did. We additionally assessed the extent to which Cochrane reviewers have searched Chinese databases in a systematic overview of the Cochrane Library (inception to 2014). The two diagnostic reviews included a total of 269 unique studies, but only 4 studies were included in both reviews. The first review included five studies published in the Chinese language (out of 151) while the second included 114 (out of 118). The summary accuracy estimates from the two reviews were comparable. Only 243 of the published 8,680 Cochrane reviews (less than 3%) searched one or more of the five major Chinese databases. These Chinese databases index about 2,500 journals, of which less than 6% are also indexed in MEDLINE. All 243 Cochrane reviews evaluated an intervention, 179 (74%) had at least one author with a Chinese affiliation; 118 (49%) addressed a topic in complementary or alternative medicine. Although searching Chinese databases may lead to the identification of a large amount of additional clinical evidence, Cochrane reviewers have rarely included them in their search strategy. We encourage future initiatives to evaluate more systematically the relevance of searching Chinese databases, as well as collaborative efforts to allow better incorporation of Chinese resources in systematic reviews.

  1. Database tomography for commercial application

    NASA Technical Reports Server (NTRS)

    Kostoff, Ronald N.; Eberhart, Henry J.

    1994-01-01

    Database tomography is a method for extracting themes and their relationships from text. The algorithms, employed begin with word frequency and word proximity analysis and build upon these results. When the word 'database' is used, think of medical or police records, patents, journals, or papers, etc. (any text information that can be computer stored). Database tomography features a full text, user interactive technique enabling the user to identify areas of interest, establish relationships, and map trends for a deeper understanding of an area of interest. Database tomography concepts and applications have been reported in journals and presented at conferences. One important feature of the database tomography algorithm is that it can be used on a database of any size, and will facilitate the users ability to understand the volume of content therein. While employing the process to identify research opportunities it became obvious that this promising technology has potential applications for business, science, engineering, law, and academe. Examples include evaluating marketing trends, strategies, relationships and associations. Also, the database tomography process would be a powerful component in the area of competitive intelligence, national security intelligence and patent analysis. User interests and involvement cannot be overemphasized.

  2. Native Health Research Database

    MedlinePlus

    ... Indian Health Board) Welcome to the Native Health Database. Please enter your search terms. Basic Search Advanced ... To learn more about searching the Native Health Database, click here. Tutorial Video The NHD has made ...

  3. Clinical Databases for Chest Physicians.

    PubMed

    Courtwright, Andrew M; Gabriel, Peter E

    2018-04-01

    A clinical database is a repository of patient medical and sociodemographic information focused on one or more specific health condition or exposure. Although clinical databases may be used for research purposes, their primary goal is to collect and track patient data for quality improvement, quality assurance, and/or actual clinical management. This article aims to provide an introduction and practical advice on the development of small-scale clinical databases for chest physicians and practice groups. Through example projects, we discuss the pros and cons of available technical platforms, including Microsoft Excel and Access, relational database management systems such as Oracle and PostgreSQL, and Research Electronic Data Capture. We consider approaches to deciding the base unit of data collection, creating consensus around variable definitions, and structuring routine clinical care to complement database aims. We conclude with an overview of regulatory and security considerations for clinical databases. Copyright © 2018 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.

  4. IPD—the Immuno Polymorphism Database

    PubMed Central

    Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

    2013-01-01

    The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793

  5. Functionally Graded Materials Database

    NASA Astrophysics Data System (ADS)

    Kisara, Katsuto; Konno, Tomomi; Niino, Masayuki

    2008-02-01

    Functionally Graded Materials Database (hereinafter referred to as FGMs Database) was open to the society via Internet in October 2002, and since then it has been managed by the Japan Aerospace Exploration Agency (JAXA). As of October 2006, the database includes 1,703 research information entries with 2,429 researchers data, 509 institution data and so on. Reading materials such as "Applicability of FGMs Technology to Space Plane" and "FGMs Application to Space Solar Power System (SSPS)" were prepared in FY 2004 and 2005, respectively. The English version of "FGMs Application to Space Solar Power System (SSPS)" is now under preparation. This present paper explains the FGMs Database, describing the research information data, the sitemap and how to use it. From the access analysis, user access results and users' interests are discussed.

  6. Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

    PubMed

    Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L

    2000-01-01

    The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.

  7. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice.

    PubMed

    Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K

    2015-01-01

    Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Full-Text Databases in Medicine.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; And Others

    1995-01-01

    Describes types of full-text databases in medicine; discusses features for searching full-text journal databases available through online vendors; reviews research on full-text databases in medicine; and describes the MEDLINE/Full-Text Research Project at the University of Missouri (Columbia) which investigated precision, recall, and relevancy.…

  9. 47 CFR 15.713 - TV bands database.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false TV bands database. 15.713 Section 15.713... TV bands database. (a) Purpose. The TV bands database serves the following functions: (1) To... databases. (b) Information in the TV bands database. (1) Facilities already recorded in Commission databases...

  10. Draft secure medical database standard.

    PubMed

    Pangalos, George

    2002-01-01

    Medical database security is a particularly important issue for all Healthcare establishments. Medical information systems are intended to support a wide range of pertinent health issues today, for example: assure the quality of care, support effective management of the health services institutions, monitor and contain the cost of care, implement technology into care without violating social values, ensure the equity and availability of care, preserve humanity despite the proliferation of technology etc.. In this context, medical database security aims primarily to support: high availability, accuracy and consistency of the stored data, the medical professional secrecy and confidentiality, and the protection of the privacy of the patient. These properties, though of technical nature, basically require that the system is actually helpful for medical care and not harmful to patients. These later properties require in turn not only that fundamental ethical principles are not violated by employing database systems, but instead, are effectively enforced by technical means. This document reviews the existing and emerging work on the security of medical database systems. It presents in detail the related problems and requirements related to medical database security. It addresses the problems of medical database security policies, secure design methodologies and implementation techniques. It also describes the current legal framework and regulatory requirements for medical database security. The issue of medical database security guidelines is also examined in detailed. The current national and international efforts in the area are studied. It also gives an overview of the research work in the area. The document also presents in detail the most complete to our knowledge set of security guidelines for the development and operation of medical database systems.

  11. High-throughput STR analysis for DNA database using direct PCR.

    PubMed

    Sim, Jeong Eun; Park, Su Jeong; Lee, Han Chul; Kim, Se-Yong; Kim, Jong Yeol; Lee, Seung Hwan

    2013-07-01

    Since the Korean criminal DNA database was launched in 2010, we have focused on establishing an automated DNA database profiling system that analyzes short tandem repeat loci in a high-throughput and cost-effective manner. We established a DNA database profiling system without DNA purification using a direct PCR buffer system. The quality of direct PCR procedures was compared with that of conventional PCR system under their respective optimized conditions. The results revealed not only perfect concordance but also an excellent PCR success rate, good electropherogram quality, and an optimal intra/inter-loci peak height ratio. In particular, the proportion of DNA extraction required due to direct PCR failure could be minimized to <3%. In conclusion, the newly developed direct PCR system can be adopted for automated DNA database profiling systems to replace or supplement conventional PCR system in a time- and cost-saving manner. © 2013 American Academy of Forensic Sciences Published 2013. This article is a U.S. Government work and is in the public domain in the U.S.A.

  12. FPD: A comprehensive phosphorylation database in fungi.

    PubMed

    Bai, Youhuang; Chen, Bin; Li, Mingzhu; Zhou, Yincong; Ren, Silin; Xu, Qin; Chen, Ming; Wang, Shihua

    2017-10-01

    Protein phosphorylation, one of the most classic post-translational modification, plays a critical role in diverse cellular processes including cell cycle, growth, and signal transduction pathways. However, the available information about phosphorylation in fungi is limited. Here, we provided a Fungi Phosphorylation Database (FPD) that comprises high-confidence in vivo phosphosites identified by MS-based proteomics in various fungal species. This comprehensive phosphorylation database contains 62 272 non-redundant phosphorylation sites in 11 222 proteins across eight organisms, including Aspergillus flavus, Aspergillus nidulans, Fusarium graminearum, Magnaporthe oryzae, Neurospora crassa, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Cryptococcus neoformans. A fungi-specific phosphothreonine motif and several conserved phosphorylation motifs were discovered by comparatively analysing the pattern of phosphorylation sites in plants, animals, and fungi. Copyright © 2017 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  13. Household Products Database: Pesticides

    MedlinePlus

    ... Names Types of Products Manufacturers Ingredients About the Database FAQ Product Recalls Help Glossary Contact Us More ... holders. Information is extracted from Consumer Product Information Database ©2001-2018 by DeLima Associates. All rights reserved. ...

  14. Database systems for knowledge-based discovery.

    PubMed

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.

  15. New Zealand's National Landslide Database

    NASA Astrophysics Data System (ADS)

    Rosser, B.; Dellow, S.; Haubrook, S.; Glassey, P.

    2016-12-01

    Since 1780, landslides have caused an average of about 3 deaths a year in New Zealand and have cost the economy an average of at least NZ$250M/a (0.1% GDP). To understand the risk posed by landslide hazards to society, a thorough knowledge of where, when and why different types of landslides occur is vital. The main objective for establishing the database was to provide a centralised national-scale, publically available database to collate landslide information that could be used for landslide hazard and risk assessment. Design of a national landslide database for New Zealand required consideration of both existing landslide data stored in a variety of digital formats, and future data, yet to be collected. Pre-existing databases were developed and populated with data reflecting the needs of the landslide or hazard project, and the database structures of the time. Bringing these data into a single unified database required a new structure capable of storing and delivering data at a variety of scales and accuracy and with different attributes. A "unified data model" was developed to enable the database to hold old and new landslide data irrespective of scale and method of capture. The database contains information on landslide locations and where available: 1) the timing of landslides and the events that may have triggered them; 2) the type of landslide movement; 3) the volume and area; 4) the source and debris tail; and 5) the impacts caused by the landslide. Information from a variety of sources including aerial photographs (and other remotely sensed data), field reconnaissance and media accounts has been collated and is presented for each landslide along with metadata describing the data sources and quality. There are currently nearly 19,000 landslide records in the database that include point locations, polygons of landslide source and deposit areas, and linear features. Several large datasets are awaiting upload which will bring the total number of landslides to

  16. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  17. Molecule database framework: a framework for creating database applications with chemical structure search capability.

    PubMed

    Kiener, Joos

    2013-12-11

    Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework

  18. ThermoData Engine Database

    National Institute of Standards and Technology Data Gateway

    SRD 103a NIST ThermoData Engine Database (PC database for purchase)   ThermoData Engine is the first product fully implementing all major principles of the concept of dynamic data evaluation formulated at NIST/TRC.

  19. The plant phenological online database (PPODB): an online database for long-term phenological data.

    PubMed

    Dierenbach, Jonas; Badeck, Franz-W; Schaber, Jörg

    2013-09-01

    We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via www.ppodb.de .

  20. JDD, Inc. Database

    NASA Technical Reports Server (NTRS)

    Miller, David A., Jr.

    2004-01-01

    JDD Inc, is a maintenance and custodial contracting company whose mission is to provide their clients in the private and government sectors "quality construction, construction management and cleaning services in the most efficient and cost effective manners, (JDD, Inc. Mission Statement)." This company provides facilities support for Fort Riley in Fo,rt Riley, Kansas and the NASA John H. Glenn Research Center at Lewis Field here in Cleveland, Ohio. JDD, Inc. is owned and operated by James Vaughn, who started as painter at NASA Glenn and has been working here for the past seventeen years. This summer I worked under Devan Anderson, who is the safety manager for JDD Inc. in the Logistics and Technical Information Division at Glenn Research Center The LTID provides all transportation, secretarial, security needs and contract management of these various services for the center. As a safety manager, my mentor provides Occupational Health and Safety Occupation (OSHA) compliance to all JDD, Inc. employees and handles all other issues (Environmental Protection Agency issues, workers compensation, safety and health training) involving to job safety. My summer assignment was not as considered "groundbreaking research" like many other summer interns have done in the past, but it is just as important and beneficial to JDD, Inc. I initially created a database using a Microsoft Excel program to classify and categorize data pertaining to numerous safety training certification courses instructed by our safety manager during the course of the fiscal year. This early portion of the database consisted of only data (training field index, employees who were present at these training courses and who was absent) from the training certification courses. Once I completed this phase of the database, I decided to expand the database and add as many dimensions to it as possible. Throughout the last seven weeks, I have been compiling more data from day to day operations and been adding the