Sample records for bioinformatics database warehouse

  1. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  2. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  3. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

  4. Atlas – a data warehouse for integrative bioinformatics

    PubMed Central

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

    2005-01-01

    Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693

  5. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  6. School District Evaluation: Database Warehouse Support.

    ERIC Educational Resources Information Center

    Adcock, Eugene P.; Haseltine, Reginald

    The Prince George's County (Maryland) school system has developed a database warehouse system as an evaluation data support tool for fulfilling the system's information demands. This paper described the Research and Evaluation Assimilation Database (READ) warehouse support system and considers the requirements for data used in evaluation and how…

  7. An ICT infrastructure to integrate clinical and molecular data in oncology research

    PubMed Central

    2012-01-01

    Background The ONCO-i2b2 platform is a bioinformatics tool designed to integrate clinical and research data and support translational research in oncology. It is implemented by the University of Pavia and the IRCCS Fondazione Maugeri hospital (FSM), and grounded on the software developed by the Informatics for Integrating Biology and the Bedside (i2b2) research center. I2b2 has delivered an open source suite based on a data warehouse, which is efficiently interrogated to find sets of interesting patients through a query tool interface. Methods Onco-i2b2 integrates data coming from multiple sources and allows the users to jointly query them. I2b2 data are then stored in a data warehouse, where facts are hierarchically structured as ontologies. Onco-i2b2 gathers data from the FSM pathology unit (PU) database and from the hospital biobank and merges them with the clinical information from the hospital information system. Our main effort was to provide a robust integrated research environment, giving a particular emphasis to the integration process and facing different challenges, consecutively listed: biospecimen samples privacy and anonymization; synchronization of the biobank database with the i2b2 data warehouse through a series of Extract, Transform, Load (ETL) operations; development and integration of a Natural Language Processing (NLP) module, to retrieve coded information, such as SNOMED terms and malignant tumors (TNM) classifications, and clinical tests results from unstructured medical records. Furthermore, we have developed an internal SNOMED ontology rested on the NCBO BioPortal web services. Results Onco-i2b2 manages data of more than 6,500 patients with breast cancer diagnosis collected between 2001 and 2011 (over 390 of them have at least one biological sample in the cancer biobank), more than 47,000 visits and 96,000 observations over 960 medical concepts. Conclusions Onco-i2b2 is a concrete example of how integrated Information and Communication Technology architecture can be implemented to support translational research. The next steps of our project will involve the extension of its capabilities by implementing new plug-in devoted to bioinformatics data analysis as well as a temporal query module. PMID:22536972

  8. Clinical Bioinformatics: challenges and opportunities

    PubMed Central

    2012-01-01

    Background Network Tools and Applications in Biology (NETTAB) Workshops are a series of meetings focused on the most promising and innovative ICT tools and to their usefulness in Bioinformatics. The NETTAB 2011 workshop, held in Pavia, Italy, in October 2011 was aimed at presenting some of the most relevant methods, tools and infrastructures that are nowadays available for Clinical Bioinformatics (CBI), the research field that deals with clinical applications of bioinformatics. Methods In this editorial, the viewpoints and opinions of three world CBI leaders, who have been invited to participate in a panel discussion of the NETTAB workshop on the next challenges and future opportunities of this field, are reported. These include the development of data warehouses and ICT infrastructures for data sharing, the definition of standards for sharing phenotypic data and the implementation of novel tools to implement efficient search computing solutions. Results Some of the most important design features of a CBI-ICT infrastructure are presented, including data warehousing, modularity and flexibility, open-source development, semantic interoperability, integrated search and retrieval of -omics information. Conclusions Clinical Bioinformatics goals are ambitious. Many factors, including the availability of high-throughput "-omics" technologies and equipment, the widespread availability of clinical data warehouses and the noteworthy increase in data storage and computational power of the most recent ICT systems, justify research and efforts in this domain, which promises to be a crucial leveraging factor for biomedical research. PMID:23095472

  9. Bioinformatics in translational drug discovery.

    PubMed

    Wooller, Sarah K; Benstead-Hume, Graeme; Chen, Xiangrong; Ali, Yusuf; Pearl, Frances M G

    2017-08-31

    Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse 'big data' that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications. © 2017 The Author(s).

  10. Improve Performance of Data Warehouse by Query Cache

    NASA Astrophysics Data System (ADS)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  11. [Technical improvement of cohort constitution in administrative health databases: Providing a tool for integration and standardization of data applicable in the French National Health Insurance Database (SNIIRAM)].

    PubMed

    Ferdynus, C; Huiart, L

    2016-09-01

    Administrative health databases such as the French National Heath Insurance Database - SNIIRAM - are a major tool to answer numerous public health research questions. However the use of such data requires complex and time-consuming data management. Our objective was to develop and make available a tool to optimize cohort constitution within administrative health databases. We developed a process to extract, transform and load (ETL) data from various heterogeneous sources in a standardized data warehouse. This data warehouse is architected as a star schema corresponding to an i2b2 star schema model. We then evaluated the performance of this ETL using data from a pharmacoepidemiology research project conducted in the SNIIRAM database. The ETL we developed comprises a set of functionalities for creating SAS scripts. Data can be integrated into a standardized data warehouse. As part of the performance assessment of this ETL, we achieved integration of a dataset from the SNIIRAM comprising more than 900 million lines in less than three hours using a desktop computer. This enables patient selection from the standardized data warehouse within seconds of the request. The ETL described in this paper provides a tool which is effective and compatible with all administrative health databases, without requiring complex database servers. This tool should simplify cohort constitution in health databases; the standardization of warehouse data facilitates collaborative work between research teams. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  12. Design of a Multi Dimensional Database for the Archimed DataWarehouse.

    PubMed

    Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine

    2005-01-01

    The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.

  13. Empowering Mayo Clinic Individualized Medicine with Genomic Data Warehousing

    PubMed Central

    Horton, Iain; Lin, Yaxiong; Reed, Gay; Wiepert, Mathieu

    2017-01-01

    Individualized medicine enables better diagnoses and treatment decisions for patients and promotes research in understanding the molecular underpinnings of disease. Linking individual patient’s genomic and molecular information with their clinical phenotypes is crucial to these efforts. To address this need, the Center for Individualized Medicine at Mayo Clinic has implemented a genomic data warehouse and a workflow management system to bring data from institutional electronic health records and genomic sequencing data from both clinical and research bioinformatics sources into the warehouse. The system is the foundation for Mayo Clinic to build a suite of tools and interfaces to support various clinical and research use cases. The genomic data warehouse is positioned to play a key role in enhancing the research capabilities and advancing individualized patient care at Mayo Clinic. PMID:28829408

  14. Empowering Mayo Clinic Individualized Medicine with Genomic Data Warehousing.

    PubMed

    Horton, Iain; Lin, Yaxiong; Reed, Gay; Wiepert, Mathieu; Hart, Steven

    2017-08-22

    Individualized medicine enables better diagnoses and treatment decisions for patients and promotes research in understanding the molecular underpinnings of disease. Linking individual patient's genomic and molecular information with their clinical phenotypes is crucial to these efforts. To address this need, the Center for Individualized Medicine at Mayo Clinic has implemented a genomic data warehouse and a workflow management system to bring data from institutional electronic health records and genomic sequencing data from both clinical and research bioinformatics sources into the warehouse. The system is the foundation for Mayo Clinic to build a suite of tools and interfaces to support various clinical and research use cases. The genomic data warehouse is positioned to play a key role in enhancing the research capabilities and advancing individualized patient care at Mayo Clinic.

  15. Metadata to Support Data Warehouse Evolution

    NASA Astrophysics Data System (ADS)

    Solodovnikova, Darja

    The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.

  16. Envirofacts Data Warehouse

    EPA Pesticide Factsheets

    The Envirofacts Data Warehouse contains information from select EPA Environmental program office databases and provides access about environmental activities that may affect air, water, and land anywhere in the United States. The Envirofacts Warehouse supports its own web enabled tools as well as a host of other EPA applications.

  17. CARDIO-i2b2: integrating arrhythmogenic disease data in i2b2.

    PubMed

    Segagni, Daniele; Tibollo, Valentina; Dagliati, Arianna; Napolitano, Carlo; G Priori, Silvia; Bellazzi, Riccardo

    2012-01-01

    The CARDIO-i2b2 project is an initiative to customize the i2b2 bioinformatics tool with the aim to integrate clinical and research data in order to support translational research in cardiology. In this work we describe the implementation and the customization of i2b2 to manage the data of arrhytmogenic disease patients collected at the Fondazione Salvatore Maugeri of Pavia in a joint project with the NYU Langone Medical Center (New York, USA). The i2b2 clinical research chart data warehouse is populated with the data obtained by the research database called TRIAD. The research infrastructure is extended by the development of new plug-ins for the i2b2 web client application able to properly select and export phenotypic data and to perform data analysis.

  18. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.

    Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBasemore » project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.« less

  19. 75 FR 3919 - Privacy Act of 1974; as Amended; Notice To Amend an Existing System of Records

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-25

    ... Manager (ORSO). Data from the current TSIS (Unix-based) version are integrated into a data warehouse with... (where provided). The TSIS data warehouse database is only available to the system developer, and... available from the warehouse data do not contain Personally Identifiable Information (PII). Of the few...

  20. Microsoft Enterprise Consortium: A Resource for Teaching Data Warehouse, Business Intelligence and Database Management Systems

    ERIC Educational Resources Information Center

    Kreie, Jennifer; Hashemi, Shohreh

    2012-01-01

    Data is a vital resource for businesses; therefore, it is important for businesses to manage and use their data effectively. Because of this, businesses value college graduates with an understanding of and hands-on experience working with databases, data warehouses and data analysis theories and tools. Faculty in many business disciplines try to…

  1. The development of health care data warehouses to support data mining.

    PubMed

    Lyman, Jason A; Scully, Kenneth; Harrison, James H

    2008-03-01

    Clinical data warehouses offer tremendous benefits as a foundation for data mining. By serving as a source for comprehensive clinical and demographic information on large patient populations, they streamline knowledge discovery efforts by providing standard and efficient mechanisms to replace time-consuming and expensive original data collection, organization, and processing. Building effective data warehouses requires knowledge of and attention to key issues in database design, data acquisition and processing, and data access and security. In this article, the authors provide an operational and technical definition of data warehouses, present examples of data mining projects enabled by existing data warehouses, and describe key issues and challenges related to warehouse development and implementation.

  2. Mass Storage Performance Information System

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter

    2000-01-01

    The purpose of this task is to develop a data warehouse to enable system administrators and their managers to gather information by querying the data logs of the MDSDS. Currently detailed logs capture the activity of the MDSDS internal to the different systems. The elements to be included in the data warehouse are requirements analysis, data cleansing, database design, database population, hardware/software acquisition, data transformation, query and report generation, and data mining.

  3. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  4. Study on resources and environmental data integration towards data warehouse construction covering trans-boundary area of China, Russia and Mongolia

    NASA Astrophysics Data System (ADS)

    Wang, J.; Song, J.; Gao, M.; Zhu, L.

    2014-02-01

    The trans-boundary area between Northern China, Mongolia and eastern Siberia of Russia is a continuous geographical area located in north eastern Asia. Many common issues in this region need to be addressed based on a uniform resources and environmental data warehouse. Based on the practice of joint scientific expedition, the paper presented a data integration solution including 3 steps, i.e., data collection standards and specifications making, data reorganization and process, data warehouse design and development. A series of data collection standards and specifications were drawn up firstly covering more than 10 domains. According to the uniform standard, 20 resources and environmental survey databases in regional scale, and 11 in-situ observation databases were reorganized and integrated. North East Asia Resources and Environmental Data Warehouse was designed, which included 4 layers, i.e., resources layer, core business logic layer, internet interoperation layer, and web portal layer. The data warehouse prototype was developed and deployed initially. All the integrated data in this area can be accessed online.

  5. Warehouse location and freight attraction in the greater El Paso region.

    DOT National Transportation Integrated Search

    2013-12-01

    This project analyzes the current and future warehouse and distribution center locations along the El Paso-Juarez regions in the U.S.-Mexico border. This research seeks has developed a comprehensive database to aid in decision support process for ide...

  6. GeneLab Analysis Working Group Kick-Off Meeting

    NASA Technical Reports Server (NTRS)

    Costes, Sylvain V.

    2018-01-01

    Goals to achieve for GeneLab AWG - GL vision - Review of GeneLab AWG charter Timeline and milestones for 2018 Logistics - Monthly Meeting - Workshop - Internship - ASGSR Introduction of team leads and goals of each group Introduction of all members Q/A Three-tier Client Strategy to Democratize Data Physiological changes, pathway enrichment, differential expression, normalization, processing metadata, reproducibility, Data federation/integration with heterogeneous bioinformatics external databases The GLDS currently serves over 100 omics investigations to the biomedical community via open access. In order to expand the scope of metadata record searches via the GLDS, we designed a metadata warehouse that collects and updates metadata records from external systems housing similar data. To demonstrate the capabilities of federated search and retrieval of these data, we imported metadata records from three open-access data systems into the GLDS metadata warehouse: NCBI's Gene Expression Omnibus (GEO), EBI's PRoteomics IDEntifications (PRIDE) repository, and the Metagenomics Analysis server (MG-RAST). Each of these systems defines metadata for omics data sets differently. One solution to bridge such differences is to employ a common object model (COM) to which each systems' representation of metadata can be mapped. Warehoused metadata records are then transformed at ETL to this single, common representation. Queries generated via the GLDS are then executed against the warehouse, and matching records are shown in the COM representation (Fig. 1). While this approach is relatively straightforward to implement, the volume of the data in the omics domain presents challenges in dealing with latency and currency of records. Furthermore, the lack of a coordinated has been federated data search for and retrieval of these kinds of data across other open-access systems, so that users are able to conduct biological meta-investigations using data from a variety of sources. Such meta-investigations are key to corroborating findings from many kinds of assays and translating them into systems biology knowledge and, eventually, therapeutics.

  7. Moby and Moby 2: creatures of the deep (web).

    PubMed

    Vandervalk, Ben P; McCarthy, E Luke; Wilkinson, Mark D

    2009-03-01

    Facile and meaningful integration of data from disparate resources is the 'holy grail' of bioinformatics. Some resources have begun to address this problem by providing their data using Semantic Web standards, specifically the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Unfortunately, adoption of Semantic Web standards has been slow overall, and even in cases where the standards are being utilized, interconnectivity between resources is rare. In response, we have seen the emergence of centralized 'semantic warehouses' that collect public data from third parties, integrate it, translate it into OWL/RDF and provide it to the community as a unified and queryable resource. One limitation of the warehouse approach is that queries are confined to the resources that have been selected for inclusion. A related problem, perhaps of greater concern, is that the majority of bioinformatics data exists in the 'Deep Web'-that is, the data does not exist until an application or analytical tool is invoked, and therefore does not have a predictable Web address. The inability to utilize Uniform Resource Identifiers (URIs) to address this data is a barrier to its accessibility via URI-centric Semantic Web technologies. Here we examine 'The State of the Union' for the adoption of Semantic Web standards in the health care and life sciences domain by key bioinformatics resources, explore the nature and connectivity of several community-driven semantic warehousing projects, and report on our own progress with the CardioSHARE/Moby-2 project, which aims to make the resources of the Deep Web transparently accessible through SPARQL queries.

  8. The strategies WDK: a graphical search interface and web development kit for functional genomics databases

    PubMed Central

    Fischer, Steve; Aurrecoechea, Cristina; Brunk, Brian P.; Gao, Xin; Harb, Omar S.; Kraemer, Eileen T.; Pennington, Cary; Treatman, Charles; Kissinger, Jessica C.; Roos, David S.; Stoeckert, Christian J.

    2011-01-01

    Web sites associated with the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) have recently introduced a graphical user interface, the Strategies WDK, intended to make advanced searching and set and interval operations easy and accessible to all users. With a design guided by usability studies, the system helps motivate researchers to perform dynamic computational experiments and explore relationships across data sets. For example, PlasmoDB users seeking novel therapeutic targets may wish to locate putative enzymes that distinguish pathogens from their hosts, and that are expressed during appropriate developmental stages. When a researcher runs one of the approximately 100 searches available on the site, the search is presented as a first step in a strategy. The strategy is extended by running additional searches, which are combined with set operators (union, intersect or minus), or genomic interval operators (overlap, contains). A graphical display uses Venn diagrams to make the strategy’s flow obvious. The interface facilitates interactive adjustment of the component searches with changes propagating forward through the strategy. Users may save their strategies, creating protocols that can be shared with colleagues. The strategy system has now been deployed on all EuPathDB databases, and successfully deployed by other projects. The Strategies WDK uses a configurable MVC architecture that is compatible with most genomics and biological warehouse databases, and is available for download at code.google.com/p/strategies-wdk. Database URL: www.eupathdb.org PMID:21705364

  9. Database Resources of the BIG Data Center in 2018

    PubMed Central

    Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan

    2018-01-01

    Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542

  10. A novel cross-disciplinary multi-institute approach to translational cancer research: lessons learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC).

    PubMed

    Patel, Ashokkumar A; Gilbertson, John R; Showe, Louise C; London, Jack W; Ross, Eric; Ochs, Michael F; Carver, Joseph; Lazarus, Andrea; Parwani, Anil V; Dhir, Rajiv; Beck, J Robert; Liebman, Michael; Garcia, Fernando U; Prichard, Jeff; Wilkerson, Myra; Herberman, Ronald B; Becich, Michael J

    2007-06-08

    The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium's intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers. The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions' IRB/HIPAA standards. Currently, this "virtual biorepository" has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium's web site. The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy.

  11. A Novel Cross-Disciplinary Multi-Institute Approach to Translational Cancer Research: Lessons Learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC)

    PubMed Central

    Patel, Ashokkumar A.; Gilbertson, John R.; Showe, Louise C.; London, Jack W.; Ross, Eric; Ochs, Michael F.; Carver, Joseph; Lazarus, Andrea; Parwani, Anil V.; Dhir, Rajiv; Beck, J. Robert; Liebman, Michael; Garcia, Fernando U.; Prichard, Jeff; Wilkerson, Myra; Herberman, Ronald B.; Becich, Michael J.

    2007-01-01

    Background: The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers. Methods: The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards. Results: Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site. Conclusions: The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy. PMID:19455246

  12. Experiences Using a Meta-Data Based Integration Infrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Critchlow, T.; Masick, R.; Slezak, T.

    1999-07-08

    A data warehouse that presents data from many of the genomics community data sources in a consistent, intuitive fashion has long been a goal of bioinformatics. Unfortunately, it is one of the goals that has not yet been achieved. One of the major problems encountered by previous attempts has been the high cost of creating and maintaining a warehouse in a dynamic environment. In this abstract we have outlined a meta-data based approach to integrating data sources that begins to address this problem. We have used this infrastructure to successfully integrate new sources into an existing warehouse in substantially lessmore » time than would have traditionally been required--and the resulting mediators are more maintainable than the traditionally defined ones would have been. In the final paper, we will describe in greater detail both our architecture and our experiences using this framework. In particular, we will outline the new, XML based representation of the meta-data, describe how the mediator generator works, and highlight other potential uses for the meta-data.« less

  13. XWeB: The XML Warehouse Benchmark

    NASA Astrophysics Data System (ADS)

    Mahboubi, Hadj; Darmont, Jérôme

    With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems.

  14. Internet-based data warehousing

    NASA Astrophysics Data System (ADS)

    Boreisha, Yurii

    2001-10-01

    In this paper, we consider the process of the data warehouse creation and population using the latest Internet and database access technologies. The logical three-tier model is applied. This approach allows developing of an enterprise schema by analyzing the various processes in the organization, and extracting the relevant entities and relationships from them. Integration with local schemas and population of the data warehouse is done through the corresponding user, business, and data services components. The hierarchy of these components is used to hide from the data warehouse users the entire complex online analytical processing functionality.

  15. Influenza research database: an integrated bioinformatics resource for influenza virus research

    USDA-ARS?s Scientific Manuscript database

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...

  16. Efficient data management tools for the heterogeneous big data warehouse

    NASA Astrophysics Data System (ADS)

    Alekseev, A. A.; Osipova, V. V.; Ivanov, M. A.; Klimentov, A.; Grigorieva, N. V.; Nalamwar, H. S.

    2016-09-01

    The traditional RDBMS has been consistent for the normalized data structures. RDBMS served well for decades, but the technology is not optimal for data processing and analysis in data intensive fields like social networks, oil-gas industry, experiments at the Large Hadron Collider, etc. Several challenges have been raised recently on the scalability of data warehouse like workload against the transactional schema, in particular for the analysis of archived data or the aggregation of data for summary and accounting purposes. The paper evaluates new database technologies like HBase, Cassandra, and MongoDB commonly referred as NoSQL databases for handling messy, varied and large amount of data. The evaluation depends upon the performance, throughput and scalability of the above technologies for several scientific and industrial use-cases. This paper outlines the technologies and architectures needed for processing Big Data, as well as the description of the back-end application that implements data migration from RDBMS to NoSQL data warehouse, NoSQL database organization and how it could be useful for further data analytics.

  17. Missing "Links" in Bioinformatics Education: Expanding Students' Conceptions of Bioinformatics Using a Biodiversity Database of Living and Fossil Reef Corals

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Budd, Ann F.

    2006-01-01

    NMITA is a reef coral biodiversity database that we use to introduce students to the expansive realm of bioinformatics beyond genetics. We introduce a series of lessons that have students use this database, thereby accessing real data that can be used to test hypotheses about biodiversity and evolution while targeting the "National Science …

  18. Biological data warehousing system for identifying transcriptional regulatory sites from gene expressions of microarray data.

    PubMed

    Tsou, Ann-Ping; Sun, Yi-Ming; Liu, Chia-Lin; Huang, Hsien-Da; Horng, Jorng-Tzong; Tsai, Meng-Feng; Liu, Baw-Juine

    2006-07-01

    Identification of transcriptional regulatory sites plays an important role in the investigation of gene regulation. For this propose, we designed and implemented a data warehouse to integrate multiple heterogeneous biological data sources with data types such as text-file, XML, image, MySQL database model, and Oracle database model. The utility of the biological data warehouse in predicting transcriptional regulatory sites of coregulated genes was explored using a synexpression group derived from a microarray study. Both of the binding sites of known transcription factors and predicted over-represented (OR) oligonucleotides were demonstrated for the gene group. The potential biological roles of both known nucleotides and one OR nucleotide were demonstrated using bioassays. Therefore, the results from the wet-lab experiments reinforce the power and utility of the data warehouse as an approach to the genome-wide search for important transcription regulatory elements that are the key to many complex biological systems.

  19. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    PubMed Central

    2010-01-01

    Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976

  20. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

    PubMed

    Taylor, Ronald C

    2010-12-21

    Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.

  1. Database Resources of the BIG Data Center in 2018.

    PubMed

    2018-01-04

    The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012.

    PubMed

    Brazas, Michelle D; Yim, David; Yeung, Winston; Ouellette, B F Francis

    2012-07-01

    The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.

  3. Protein Bioinformatics Databases and Resources

    PubMed Central

    Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

    2017-01-01

    Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231

  4. CHOmine: an integrated data warehouse for CHO systems biology and modeling

    PubMed Central

    Hanscho, Michael; Ruckerbauer, David E.; Zanghellini, Jürgen; Borth, Nicole

    2017-01-01

    Abstract The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. Database URL: http://www.chogenome.org PMID:28605771

  5. Extracting patterns of database and software usage from the bioinformatics literature

    PubMed Central

    Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert

    2014-01-01

    Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253

  6. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  7. FaceWarehouse: a 3D facial expression database for visual computing.

    PubMed

    Cao, Chen; Weng, Yanlin; Zhou, Shun; Tong, Yiying; Zhou, Kun

    2014-03-01

    We present FaceWarehouse, a database of 3D facial expressions for visual computing applications. We use Kinect, an off-the-shelf RGBD camera, to capture 150 individuals aged 7-80 from various ethnic backgrounds. For each person, we captured the RGBD data of her different expressions, including the neutral expression and 19 other expressions such as mouth-opening, smile, kiss, etc. For every RGBD raw data record, a set of facial feature points on the color image such as eye corners, mouth contour, and the nose tip are automatically localized, and manually adjusted if better accuracy is required. We then deform a template facial mesh to fit the depth data as closely as possible while matching the feature points on the color image to their corresponding points on the mesh. Starting from these fitted face meshes, we construct a set of individual-specific expression blendshapes for each person. These meshes with consistent topology are assembled as a rank-3 tensor to build a bilinear face model with two attributes: identity and expression. Compared with previous 3D facial databases, for every person in our database, there is a much richer matching collection of expressions, enabling depiction of most human facial actions. We demonstrate the potential of FaceWarehouse for visual computing with four applications: facial image manipulation, face component transfer, real-time performance-based facial image animation, and facial animation retargeting from video to image.

  8. The land management and operations database (LMOD)

    USDA-ARS?s Scientific Manuscript database

    This paper presents the design, implementation, deployment, and application of the Land Management and Operations Database (LMOD). LMOD is the single authoritative source for reference land management and operation reference data within the USDA enterprise data warehouse. LMOD supports modeling appl...

  9. System and method for integrating and accessing multiple data sources within a data warehouse architecture

    DOEpatents

    Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA

    2006-12-19

    A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.

  10. Database Are Not Toasters: A Framework for Comparing Data Warehouse Appliances

    NASA Astrophysics Data System (ADS)

    Trajman, Omer; Crolotte, Alain; Steinhoff, David; Nambiar, Raghunath Othayoth; Poess, Meikel

    The success of Business Intelligence (BI) applications depends on two factors, the ability to analyze data ever more quickly and the ability to handle ever increasing volumes of data. Data Warehouse (DW) and Data Mart (DM) installations that support BI applications have historically been built using traditional architectures either designed from the ground up or based on customized reference system designs. The advent of Data Warehouse Appliances (DA) brings packaged software and hardware solutions that address performance and scalability requirements for certain market segments. The differences between DAs and custom installations make direct comparisons between them impractical and suggest the need for a targeted DA benchmark. In this paper we review data warehouse appliances by surveying thirteen products offered today. We assess the common characteristics among them and propose a classification for DA offerings. We hope our results will help define a useful benchmark for DAs.

  11. Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

    PubMed

    Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

    2012-07-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. DBMap: a TreeMap-based framework for data navigation and visualization of brain research registry

    NASA Astrophysics Data System (ADS)

    Zhang, Ming; Zhang, Hong; Tjandra, Donny; Wong, Stephen T. C.

    2003-05-01

    The purpose of this study is to investigate and apply a new, intuitive and space-conscious visualization framework to facilitate efficient data presentation and exploration of large-scale data warehouses. We have implemented the DBMap framework for the UCSF Brain Research Registry. Such a novel utility would facilitate medical specialists and clinical researchers in better exploring and evaluating a number of attributes organized in the brain research registry. The current UCSF Brain Research Registry consists of a federation of disease-oriented database modules, including Epilepsy, Brain Tumor, Intracerebral Hemorrphage, and CJD (Creuzfeld-Jacob disease). These database modules organize large volumes of imaging and non-imaging data to support Web-based clinical research. While the data warehouse supports general information retrieval and analysis, there lacks an effective way to visualize and present the voluminous and complex data stored. This study investigates whether the TreeMap algorithm can be adapted to display and navigate categorical biomedical data warehouse or registry. TreeMap is a space constrained graphical representation of large hierarchical data sets, mapped to a matrix of rectangles, whose size and color represent interested database fields. It allows the display of a large amount of numerical and categorical information in limited real estate of computer screen with an intuitive user interface. The paper will describe, DBMap, the proposed new data visualization framework for large biomedical databases. Built upon XML, Java and JDBC technologies, the prototype system includes a set of software modules that reside in the application server tier and provide interface to backend database tier and front-end Web tier of the brain registry.

  13. Biomedical informatics: development of a comprehensive data warehouse for clinical and genomic breast cancer research.

    PubMed

    Hu, Hai; Brzeski, Henry; Hutchins, Joe; Ramaraj, Mohan; Qu, Long; Xiong, Richard; Kalathil, Surendran; Kato, Rand; Tenkillaya, Santhosh; Carney, Jerry; Redd, Rosann; Arkalgudvenkata, Sheshkumar; Shahzad, Kashif; Scott, Richard; Cheng, Hui; Meadow, Stephen; McMichael, John; Sheu, Shwu-Lin; Rosendale, David; Kvecher, Leonid; Ahern, Stephen; Yang, Song; Zhang, Yonghong; Jordan, Rick; Somiari, Stella B; Hooke, Jeffrey; Shriver, Craig D; Somiari, Richard I; Liebman, Michael N

    2004-10-01

    The Windber Research Institute is an integrated high-throughput research center employing clinical, genomic and proteomic platforms to produce terabyte levels of data. We use biomedical informatics technologies to integrate all of these operations. This report includes information on a multi-year, multi-phase hybrid data warehouse project currently under development in the Institute. The purpose of the warehouse is to host the terabyte-level of internal experimentally generated data as well as data from public sources. We have previously reported on the phase I development, which integrated limited internal data sources and selected public databases. Currently, we are completing phase II development, which integrates our internal automated data sources and develops visualization tools to query across these data types. This paper summarizes our clinical and experimental operations, the data warehouse development, and the challenges we have faced. In phase III we plan to federate additional manual internal and public data sources and then to develop and adapt more data analysis and mining tools. We expect that the final implementation of the data warehouse will greatly facilitate biomedical informatics research.

  14. SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.

    EPA Science Inventory

    Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...

  15. The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System--a one-stop gateway to online bioinformatics databases and software tools.

    PubMed

    Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy

    2007-01-01

    To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).

  16. Mining a Web Citation Database for Author Co-Citation Analysis.

    ERIC Educational Resources Information Center

    He, Yulan; Hui, Siu Cheung

    2002-01-01

    Proposes a mining process to automate author co-citation analysis based on the Web Citation Database, a data warehouse for storing citation indices of Web publications. Describes the use of agglomerative hierarchical clustering for author clustering and multidimensional scaling for displaying author cluster maps, and explains PubSearch, a…

  17. GlycoRDF: an ontology to standardize glycomics data in RDF

    PubMed Central

    Ranzinger, Rene; Aoki-Kinoshita, Kiyoko F.; Campbell, Matthew P.; Kawano, Shin; Lütteke, Thomas; Okuda, Shujiro; Shinmachi, Daisuke; Shikanai, Toshihide; Sawaki, Hiromichi; Toukach, Philip; Matsubara, Masaaki; Yamada, Issaku; Narimatsu, Hisashi

    2015-01-01

    Motivation: Over the last decades several glycomics-based bioinformatics resources and databases have been created and released to the public. Unfortunately, there is no common standard in the representation of the stored information or a common machine-readable interface allowing bioinformatics groups to easily extract and cross-reference the stored information. Results: An international group of bioinformatics experts in the field of glycomics have worked together to create a standard Resource Description Framework (RDF) representation for glycomics data, focused on glycan sequences and related biological source, publications and experimental data. This RDF standard is defined by the GlycoRDF ontology and will be used by database providers to generate common machine-readable exports of the data stored in their databases. Availability and implementation: The ontology, supporting documentation and source code used by database providers to generate standardized RDF are available online (http://www.glycoinfo.org/GlycoRDF/). Contact: rene@ccrc.uga.edu or kkiyoko@soka.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388145

  18. Automated Data Aggregation for Time-Series Analysis: Study Case on Anaesthesia Data Warehouse.

    PubMed

    Lamer, Antoine; Jeanne, Mathieu; Ficheur, Grégoire; Marcilly, Romaric

    2016-01-01

    Data stored in operational databases are not reusable directly. Aggregation modules are necessary to facilitate secondary use. They decrease volume of data while increasing the number of available information. In this paper, we present four automated engines of aggregation, integrated into an anaesthesia data warehouse. Four instances of clinical questions illustrate the use of those engines for various improvements of quality of care: duration of procedure, drug administration, assessment of hypotension and its related treatment.

  19. Simple re-instantiation of small databases using cloud computing.

    PubMed

    Tan, Tin Wee; Xie, Chao; De Silva, Mark; Lim, Kuan Siong; Patro, C Pawan K; Lim, Shen Jean; Govindarajan, Kunde Ramamoorthy; Tong, Joo Chuan; Choo, Khar Heng; Ranganathan, Shoba; Khan, Asif M

    2013-01-01

    Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.

  20. Simple re-instantiation of small databases using cloud computing

    PubMed Central

    2013-01-01

    Background Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. Results We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Conclusions Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear. PMID:24564380

  1. Warehouses information system design and development

    NASA Astrophysics Data System (ADS)

    Darajatun, R. A.; Sukanta

    2017-12-01

    Materials/goods handling industry is fundamental for companies to ensure the smooth running of their warehouses. Efficiency and organization within every aspect of the business is essential in order to gain a competitive advantage. The purpose of this research is design and development of Kanban of inventory storage and delivery system. Application aims to facilitate inventory stock checks to be more efficient and effective. Users easily input finished goods from production department, warehouse, customer, and also suppliers. Master data designed as complete as possible to be prepared applications used in a variety of process logistic warehouse variations. The author uses Java programming language to develop the application, which is used for building Java Web applications, while the database used is MySQL. System development methodology that I use is the Waterfall methodology. Waterfall methodology has several stages of the Analysis, System Design, Implementation, Integration, Operation and Maintenance. In the process of collecting data the author uses the method of observation, interviews, and literature.

  2. Storage and retrieval of medical images from data warehouses

    NASA Astrophysics Data System (ADS)

    Tikekar, Rahul V.; Fotouhi, Farshad A.; Ragan, Don P.

    1995-11-01

    As our applications continue to become more sophisticated, the demand for more storage continues to rise. Hence many businesses are looking toward data warehousing technology to satisfy their storage needs. A warehouse is different from a conventional database and hence deserves a different approach while storing data that might be retrieved at a later point in time. In this paper we look at the problem of storing and retrieving medical image data from a warehouse. We regard the warehouse as a pyramid with fast storage devices at the top and slower storage devices at the bottom. Our approach is to store the most needed information abstract at the top of the pyramid and more detailed and storage consuming data toward the end of the pyramid. This information is linked for browsing purposes. In a similar fashion, during the retrieval of data, the user is given a sample representation with browse option of the detailed data and, as required, more and more details are made available.

  3. Developing a standardized healthcare cost data warehouse.

    PubMed

    Visscher, Sue L; Naessens, James M; Yawn, Barbara P; Reinalda, Megan S; Anderson, Stephanie S; Borah, Bijan J

    2017-06-12

    Research addressing value in healthcare requires a measure of cost. While there are many sources and types of cost data, each has strengths and weaknesses. Many researchers appear to create study-specific cost datasets, but the explanations of their costing methodologies are not always clear, causing their results to be difficult to interpret. Our solution, described in this paper, was to use widely accepted costing methodologies to create a service-level, standardized healthcare cost data warehouse from an institutional perspective that includes all professional and hospital-billed services for our patients. The warehouse is based on a National Institutes of Research-funded research infrastructure containing the linked health records and medical care administrative data of two healthcare providers and their affiliated hospitals. Since all patients are identified in the data warehouse, their costs can be linked to other systems and databases, such as electronic health records, tumor registries, and disease or treatment registries. We describe the two institutions' administrative source data; the reference files, which include Medicare fee schedules and cost reports; the process of creating standardized costs; and the warehouse structure. The costing algorithm can create inflation-adjusted standardized costs at the service line level for defined study cohorts on request. The resulting standardized costs contained in the data warehouse can be used to create detailed, bottom-up analyses of professional and facility costs of procedures, medical conditions, and patient care cycles without revealing business-sensitive information. After its creation, a standardized cost data warehouse is relatively easy to maintain and can be expanded to include data from other providers. Individual investigators who may not have sufficient knowledge about administrative data do not have to try to create their own standardized costs on a project-by-project basis because our data warehouse generates standardized costs for defined cohorts upon request.

  4. Preliminary Study of Bioinformatics Patents and Their Classifications Registered in the KIPRIS Database.

    PubMed

    Park, Hyun-Seok

    2012-12-01

    Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.

  5. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  6. What Is Spatio-Temporal Data Warehousing?

    NASA Astrophysics Data System (ADS)

    Vaisman, Alejandro; Zimányi, Esteban

    In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.

  7. CHOmine: an integrated data warehouse for CHO systems biology and modeling.

    PubMed

    Gerstl, Matthias P; Hanscho, Michael; Ruckerbauer, David E; Zanghellini, Jürgen; Borth, Nicole

    2017-01-01

    The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. http://www.chogenome.org. © The Author(s) 2017. Published by Oxford University Press.

  8. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  9. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  10. The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database.

    PubMed

    Choi, Jeongseok; Kim, Jaekwon; Lee, Dong Kyun; Jang, Kwang Soo; Kim, Dai-Jin; Choi, In Young

    2016-03-01

    Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA.

  11. BioStar: an online question & answer resource for the bioinformatics community

    USDA-ARS?s Scientific Manuscript database

    Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate and local colleagues may be overburdened or unfamil...

  12. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  13. Snow model analysis.

    DOT National Transportation Integrated Search

    2014-01-01

    This study developed a new snow model and a database which warehouses geometric, weather and traffic : data on New Jersey highways. The complexity of the model development lies in considering variable road : width, different spreading/plowing pattern...

  14. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface.

    PubMed

    Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B; Almon, Richard R; DuBois, Debra C; Jusko, William J; Hoffman, Eric P

    2004-01-01

    Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).

  15. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface

    PubMed Central

    Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B.; Almon, Richard R.; DuBois, Debra C.; Jusko, William J.; Hoffman, Eric P.

    2004-01-01

    Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp). PMID:14681485

  16. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  17. Benchmarking distributed data warehouse solutions for storing genomic variant information

    PubMed Central

    Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.

    2017-01-01

    Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442

  18. SuperTarget goes quantitative: update on drug–target interactions

    PubMed Central

    Hecker, Nikolai; Ahmed, Jessica; von Eichborn, Joachim; Dunkel, Mathias; Macha, Karel; Eckert, Andreas; Gilson, Michael K.; Bourne, Philip E.; Preissner, Robert

    2012-01-01

    There are at least two good reasons for the on-going interest in drug–target interactions: first, drug-effects can only be fully understood by considering a complex network of interactions to multiple targets (so-called off-target effects) including metabolic and signaling pathways; second, it is crucial to consider drug-target-pathway relations for the identification of novel targets for drug development. To address this on-going need, we have developed a web-based data warehouse named SuperTarget, which integrates drug-related information associated with medical indications, adverse drug effects, drug metabolism, pathways and Gene Ontology (GO) terms for target proteins. At present, the updated database contains >6000 target proteins, which are annotated with >330 000 relations to 196 000 compounds (including approved drugs); the vast majority of interactions include binding affinities and pointers to the respective literature sources. The user interface provides tools for drug screening and target similarity inclusion. A query interface enables the user to pose complex queries, for example, to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target proteins within a certain affinity range. SuperTarget is available at http://bioinformatics.charite.de/supertarget. PMID:22067455

  19. Edaphostat: interactive ecological analysis of soil organism occurrences and preferences from the Edaphobase data warehouse

    PubMed Central

    Scholz-Starke, Björn; Burkhardt, Ulrich; Lesch, Stephan; Rick, Sebastian; Russell, David; Roß-Nickoll, Martina; Ottermanns, Richard

    2017-01-01

    Abstract The Edaphostat web application allows interactive and dynamic analyses of soil organism data stored in the Edaphobase data warehouse. It is part of the Edaphobase web application and can be accessed by any modern browser. The tool combines data from different sources (publications, field studies and museum collections) and allows species preferences along various environmental gradients (i.e. C/N ratio and pH) and classification systems (habitat type and soil type) to be analyzed. Database URL: Edaphostat is part of the Edaphobase Web Application available at https://portal.edaphobase.org PMID:29220469

  20. Designing a Clinical Data Warehouse Architecture to Support Quality Improvement Initiatives.

    PubMed

    Chelico, John D; Wilcox, Adam B; Vawdrey, David K; Kuperman, Gilad J

    2016-01-01

    Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement.

  1. Designing a Clinical Data Warehouse Architecture to Support Quality Improvement Initiatives

    PubMed Central

    Chelico, John D.; Wilcox, Adam B.; Vawdrey, David K.; Kuperman, Gilad J.

    2016-01-01

    Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement. PMID:28269833

  2. Exploring Cystic Fibrosis Using Bioinformatics Tools: A Module Designed for the Freshman Biology Course

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2011-01-01

    We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…

  3. A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

    PubMed

    Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert

    2016-01-01

    Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.

  4. Querying and Computing with BioCyc Databases

    PubMed Central

    Krummenacker, Markus; Paley, Suzanne; Mueller, Lukas; Yan, Thomas; Karp, Peter D.

    2006-01-01

    Summary We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database. Availability The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml PMID:15961440

  5. GlycoRDF: an ontology to standardize glycomics data in RDF.

    PubMed

    Ranzinger, Rene; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P; Kawano, Shin; Lütteke, Thomas; Okuda, Shujiro; Shinmachi, Daisuke; Shikanai, Toshihide; Sawaki, Hiromichi; Toukach, Philip; Matsubara, Masaaki; Yamada, Issaku; Narimatsu, Hisashi

    2015-03-15

    Over the last decades several glycomics-based bioinformatics resources and databases have been created and released to the public. Unfortunately, there is no common standard in the representation of the stored information or a common machine-readable interface allowing bioinformatics groups to easily extract and cross-reference the stored information. An international group of bioinformatics experts in the field of glycomics have worked together to create a standard Resource Description Framework (RDF) representation for glycomics data, focused on glycan sequences and related biological source, publications and experimental data. This RDF standard is defined by the GlycoRDF ontology and will be used by database providers to generate common machine-readable exports of the data stored in their databases. The ontology, supporting documentation and source code used by database providers to generate standardized RDF are available online (http://www.glycoinfo.org/GlycoRDF/). © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Design and implementation of a library-based information service in molecular biology and genetics at the University of Pittsburgh

    PubMed Central

    Chattopadhyay, Ansuman; Tannery, Nancy Hrinya; Silverman, Deborah A. L.; Bergen, Phillip; Epstein, Barbara A.

    2006-01-01

    Setting: In summer 2002, the Health Sciences Library System (HSLS) at the University of Pittsburgh initiated an information service in molecular biology and genetics to assist researchers with identifying and utilizing bioinformatics tools. Program Components: This novel information service comprises hands-on training workshops and consultation on the use of bioinformatics tools. The HSLS also provides an electronic portal and networked access to public and commercial molecular biology databases and software packages. Evaluation Mechanisms: Researcher feedback gathered during the first three years of workshops and individual consultation indicate that the information service is meeting user needs. Next Steps/Future Directions: The service's workshop offerings will expand to include emerging bioinformatics topics. A frequently asked questions database is also being developed to reuse advice on complex bioinformatics questions. PMID:16888665

  7. Proteogenomics approaches for studying cancer biology and their potential in the identification of acute myeloid leukemia biomarkers.

    PubMed

    Hernandez-Valladares, Maria; Vaudel, Marc; Selheim, Frode; Berven, Frode; Bruserud, Øystein

    2017-08-01

    Mass spectrometry (MS)-based proteomics has become an indispensable tool for the characterization of the proteome and its post-translational modifications (PTM). In addition to standard protein sequence databases, proteogenomics strategies search the spectral data against the theoretical spectra obtained from customized protein sequence databases. Up to date, there are no published proteogenomics studies on acute myeloid leukemia (AML) samples. Areas covered: Proteogenomics involves the understanding of genomic and proteomic data. The intersection of both datatypes requires advanced bioinformatics skills. A standard proteogenomics workflow that could be used for the study of AML samples is described. The generation of customized protein sequence databases as well as bioinformatics tools and pipelines commonly used in proteogenomics are discussed in detail. Expert commentary: Drawing on evidence from recent cancer proteogenomics studies and taking into account the public availability of AML genomic data, the interpretation of present and future MS-based AML proteomic data using AML-specific protein sequence databases could discover new biological mechanisms and targets in AML. However, proteogenomics workflows including bioinformatics guidelines can be challenging for the wide AML research community. It is expected that further automation and simplification of the bioinformatics procedures might attract AML investigators to adopt the proteogenomics strategy.

  8. Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse.

    PubMed

    Hernandez, Penni; Podchiyska, Tanya; Weber, Susan; Ferris, Todd; Lowe, Henry

    2009-11-14

    The Stanford Translational Research Integrated Database Environment (STRIDE) clinical data warehouse integrates medication information from two Stanford hospitals that use different drug representation systems. To merge this pharmacy data into a single, standards-based model supporting research we developed an algorithm to map HL7 pharmacy orders to RxNorm concepts. A formal evaluation of this algorithm on 1.5 million pharmacy orders showed that the system could accurately assign pharmacy orders in over 96% of cases. This paper describes the algorithm and discusses some of the causes of failures in mapping to RxNorm.

  9. A Data Warehouse to Support Condition Based Maintenance (CBM)

    DTIC Science & Technology

    2005-05-01

    Application ( VBA ) code sequence to import the original MAST-generated CSV and then create a single output table in DBASE IV format. The DBASE IV format...database architecture (Oracle, Sybase, MS- SQL , etc). This design includes table definitions, comments, specification of table attributes, primary and foreign...built queries and applications. Needs the application developers to construct data views. No SQL programming experience. b. Power Database User - knows

  10. JNDMS Task Authorization 2 Report

    DTIC Science & Technology

    2013-10-01

    uses Barnyard to store alarms from all DREnet Snort sensors in a MySQL database. Barnyard is an open source tool designed to work with Snort to take...Technology ITI Information Technology Infrastructure J2EE Java 2 Enterprise Edition JAR Java Archive. This is an archive file format defined by Java ...standards. JDBC Java Database Connectivity JDW JNDMS Data Warehouse JNDMS Joint Network and Defence Management System JNDMS Joint Network Defence and

  11. Selecting materialized views using random algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi

    2007-04-01

    The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.

  12. Bioinformatics.

    PubMed

    Moore, Jason H

    2007-11-01

    Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.

  13. ENFIN--A European network for integrative systems biology.

    PubMed

    Kahlem, Pascal; Clegg, Andrew; Reisinger, Florian; Xenarios, Ioannis; Hermjakob, Henning; Orengo, Christine; Birney, Ewan

    2009-11-01

    Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.

  14. A comprehensive clinical research database based on CDISC ODM and i2b2.

    PubMed

    Meineke, Frank A; Stäubert, Sebastian; Löbe, Matthias; Winter, Alfred

    2014-01-01

    We present a working approach for a clinical research database as part of an archival information system. The CDISC ODM standard is target for clinical study and research relevant routine data, thus decoupling the data ingest process from the access layer. The presented research database is comprehensive as it covers annotating, mapping and curation of poorly annotated source data. Besides a conventional relational database the medical data warehouse i2b2 serves as main frontend for end-users. The system we developed is suitable to support patient recruitment, cohort identification and quality assurance in daily routine.

  15. bioNerDS: exploring bioinformatics’ database and software use through literature mining

    PubMed Central

    2013-01-01

    Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. PMID:23768135

  16. What Information Does Your EHR Contain? Automatic Generation of a Clinical Metadata Warehouse (CMDW) to Support Identification and Data Access Within Distributed Clinical Research Networks.

    PubMed

    Bruland, Philipp; Doods, Justin; Storck, Michael; Dugas, Martin

    2017-01-01

    Data dictionaries provide structural meta-information about data definitions in health information technology (HIT) systems. In this regard, reusing healthcare data for secondary purposes offers several advantages (e.g. reduce documentation times or increased data quality). Prerequisites for data reuse are its quality, availability and identical meaning of data. In diverse projects, research data warehouses serve as core components between heterogeneous clinical databases and various research applications. Given the complexity (high number of data elements) and dynamics (regular updates) of electronic health record (EHR) data structures, we propose a clinical metadata warehouse (CMDW) based on a metadata registry standard. Metadata of two large hospitals were automatically inserted into two CMDWs containing 16,230 forms and 310,519 data elements. Automatic updates of metadata are possible as well as semantic annotations. A CMDW allows metadata discovery, data quality assessment and similarity analyses. Common data models for distributed research networks can be established based on similarity analyses.

  17. Easily configured real-time CPOE Pick Off Tool supporting focused clinical research and quality improvement.

    PubMed

    Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A

    2014-01-01

    Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.

  18. Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

    PubMed Central

    Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

    2007-01-01

    The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876

  19. The Androgen Receptor Gene Mutations Database.

    PubMed

    Gottlieb, B; Lehvaslaiho, H; Beitel, L K; Lumbroso, R; Pinsky, L; Trifiro, M

    1998-01-01

    The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 272 to 309 in the past year. We have expanded the database: (i) by giving each entry an accession number; (ii) by adding information on the length of polymorphic polyglutamine (polyGln) and polyglycine (polyGly) tracts in exon 1; (iii) by adding information on large gene deletions; (iv) by providing a direct link with a completely searchable database (courtesy EMBL-European Bioinformatics Institute). The addition of the exon 1 polymorphisms is discussed in light of their possible relevance as markers for predisposition to prostate or breast cancer. The database is also available on the internet (http://www.mcgill. ca/androgendb/ ), from EMBL-European Bioinformatics Institute (ftp. ebi.ac.uk/pub/databases/androgen ), or as a Macintosh FilemakerPro or Word file (MC33@musica.mcgill.ca).

  20. The Androgen Receptor Gene Mutations Database.

    PubMed Central

    Gottlieb, B; Lehvaslaiho, H; Beitel, L K; Lumbroso, R; Pinsky, L; Trifiro, M

    1998-01-01

    The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 272 to 309 in the past year. We have expanded the database: (i) by giving each entry an accession number; (ii) by adding information on the length of polymorphic polyglutamine (polyGln) and polyglycine (polyGly) tracts in exon 1; (iii) by adding information on large gene deletions; (iv) by providing a direct link with a completely searchable database (courtesy EMBL-European Bioinformatics Institute). The addition of the exon 1 polymorphisms is discussed in light of their possible relevance as markers for predisposition to prostate or breast cancer. The database is also available on the internet (http://www.mcgill. ca/androgendb/ ), from EMBL-European Bioinformatics Institute (ftp. ebi.ac.uk/pub/databases/androgen ), or as a Macintosh FilemakerPro or Word file (MC33@musica.mcgill.ca). PMID:9399843

  1. FCDD: A Database for Fruit Crops Diseases.

    PubMed

    Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal

    2014-01-01

    Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/

  2. Data integration and warehousing: coordination between newborn screening and related public health programs.

    PubMed

    Therrell, Bradford L

    2003-01-01

    At birth, patient demographic and health information begin to accumulate in varied databases. There are often multiple sources of the same or similar data. New public health programs are often created without considering data linkages. Recently, newborn hearing screening (NHS) programs and immunization programs have virtually ignored the existence of newborn dried blood spot (DBS) newborn screening databases containing similar demographic data, creating data duplication in their 'new' systems. Some progressive public health departments are developing data warehouses of basic, recurrent patient information, and linking these databases to other health program databases where programs and services can benefit from such linkages. Demographic data warehousing saves time (and money) by eliminating duplicative data entry and reducing the chances of data errors. While newborn screening data are usually the first data available, they should not be the only data source considered for early data linkage or for populating a data warehouse. Birth certificate information should also be considered along with other data sources for infants that may not have received newborn screening or who may have been born outside of the jurisdiction and not have birth certificate information locally available. This newborn screening serial number provides a convenient identification number for use in the DBS program and for linking with other systems. As a minimum, data linkages should exist between newborn dried blood spot screening, newborn hearing screening, immunizations, birth certificates and birth defect registries.

  3. Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system

    PubMed Central

    Murphy, SN; Barnett, GO; Chueh, HC

    2000-01-01

    The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL. PMID:11080028

  4. Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system

    PubMed

    Murphy; Barnett; Chueh

    2000-01-01

    The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL.

  5. DICOM Data Warehouse: Part 2.

    PubMed

    Langer, Steve G

    2016-06-01

    In 2010, the DICOM Data Warehouse (DDW) was launched as a data warehouse for DICOM meta-data. Its chief design goals were to have a flexible database schema that enabled it to index standard patient and study information, modality specific tags (public and private), and create a framework to derive computable information (derived tags) from the former items. Furthermore, it was to map the above information to an internally standard lexicon that enables a non-DICOM savvy programmer to write standard SQL queries and retrieve the equivalent data from a cohort of scanners, regardless of what tag that data element was found in over the changing epochs of DICOM and ensuing migration of elements from private to public tags. After 5 years, the original design has scaled astonishingly well. Very little has changed in the database schema. The knowledge base is now fluent in over 90 device types. Also, additional stored procedures have been written to compute data that is derivable from standard or mapped tags. Finally, an early concern is that the system would not be able to address the variability DICOM-SR objects has been addressed. As of this writing the system is indexing 300 MR, 600 CT, and 2000 other (XA, DR, CR, MG) imaging studies per day. The only remaining issue to be solved is the case for tags that were not prospectively indexed-and indeed, this final challenge may lead to a noSQL, big data, approach in a subsequent version.

  6. Towards data warehousing and mining of protein unfolding simulation data.

    PubMed

    Berrar, Daniel; Stahl, Frederic; Silva, Candida; Rodrigues, J Rui; Brito, Rui M M; Dubitzky, Werner

    2005-10-01

    The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

  7. ISYMOD: a knowledge warehouse for the identification, assembly and analysis of bacterial integrated systems.

    PubMed

    Chabalier, Julie; Capponi, Cécile; Quentin, Yves; Fichant, Gwennaele

    2005-04-01

    Complex biological functions emerge from interactions between proteins in stable supra-molecular assemblies and/or through transitory contacts. Most of the time protein partners of the assemblies are composed of one or several domains which exhibit different biochemical functions. Thus the study of cellular process requires the identification of different functional units and their integration in an interaction network; such complexes are referred to as integrated systems. In order to exploit with optimum efficiency the increased release of data, automated bioinformatics strategies are needed to identify, reconstruct and model such systems. For that purpose, we have developed a knowledge warehouse dedicated to the representation and acquisition of bacterial integrated systems involved in the exchange of the bacterial cell with its environment. ISYMOD is a knowledge warehouse that consistently integrates in the same environment the data and the methods used for their acquisition. This is achieved through the construction of (1) a domain knowledge base (DKB) devoted to the storage of the knowledge about the systems, their functional specificities, their partners and how they are related and (2) a methodological knowledge base (MKB) which depicts the task layout used to identify and reconstruct functional integrated systems. Instantiation of the DKB is obtained by solving the tasks of the MKB, whereas some tasks need instances of the DKB to be solved. AROM, an object-based knowledge representation system, has been used to design the DKB, and its task manager, AROMTasks, for developing the MKB. In this study two integrated systems, ABC transporters and two component systems, both involved in adaptation processes of a bacterial cell to its biotope, have been used to evaluate the feasibility of the approach.

  8. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  9. Navigating the Challenges of the Cloud

    ERIC Educational Resources Information Center

    Ovadia, Steven

    2010-01-01

    Cloud computing is increasingly popular in education. Cloud computing is "the delivery of computer services from vast warehouses of shared machines that enables companies and individuals to cut costs by handing over the running of their email, customer databases or accounting software to someone else, and then accessing it over the internet."…

  10. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    ERIC Educational Resources Information Center

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  11. Tracing Boundaries, Effacing Boundaries: Information Literacy as an Academic Discipline

    ERIC Educational Resources Information Center

    Veach, Grace

    2012-01-01

    Both librarianship and composition have been shaken by recent developments in higher education. In libraries ebooks and online databases threaten the traditional "library as warehouse model," while in composition, studies like The Citation Project show that students are not learning how to incorporate sources into their own writing…

  12. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  13. Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology

    PubMed Central

    Wood, Louisa; Gebhardt, Philipp

    2013-01-01

    Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266

  14. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  15. Bioinformatics in Undergraduate Education: Practical Examples

    ERIC Educational Resources Information Center

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  16. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    PubMed Central

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  17. Prediction of Acute Mountain Sickness using a Blood-Based Test

    DTIC Science & Technology

    2016-01-01

    2015): In quarter 17 we focused on two major tasks: getting the RNA purified and ready for chip analysis and working on the bioinformatics ... bioinformatics organization of all the data we will examine for this study. To remind the reviewer, we have a primary dataset of ~120 subjects who were studied...companion study, AltitudeOmics, to the database of gene studies to be analyzed for AMS prediction • expansion of a bioinformatics team to include an

  18. Farm animal genomics and informatics: an update

    PubMed Central

    Fadiel, Ahmed; Anidi, Ifeanyi; Eichenbaum, Kenneth D.

    2005-01-01

    Farm animal genomics is of interest to a wide audience of researchers because of the utility derived from understanding how genomics and proteomics function in various organisms. Applications such as xenotransplantation, increased livestock productivity, bioengineering new materials, products and even fabrics are several reasons for thriving farm animal genome activity. Currently mined in rapidly growing data warehouses, completed genomes of chicken, fish and cows are available but are largely stored in decentralized data repositories. In this paper, we provide an informatics primer on farm animal bioinformatics and genome project resources which drive attention to the most recent advances in the field. We hope to provide individuals in biotechnology and in the farming industry with information on resources and updates concerning farm animal genome projects. PMID:16275782

  19. Bioinformatics: A History of Evolution "In Silico"

    ERIC Educational Resources Information Center

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  20. Incorporation of Bioinformatics Exercises into the Undergraduate Biochemistry Curriculum

    ERIC Educational Resources Information Center

    Feig, Andrew L.; Jabri, Evelyn

    2002-01-01

    The field of bioinformatics is developing faster than most biochemistry textbooks can adapt. Supplementing the undergraduate biochemistry curriculum with data-mining exercises is an ideal way to expose the students to the common databases and tools that take advantage of this vast repository of biochemical information. An integrated collection of…

  1. Exploring DNA Structure with Cn3D

    ERIC Educational Resources Information Center

    Porter, Sandra G.; Day, Joseph; McCarty, Richard E.; Shearn, Allen; Shingles, Richard; Fletcher, Linnea; Murphy, Stephanie; Pearlman, Rebecca

    2007-01-01

    Researchers in the field of bioinformatics have developed a number of analytical programs and databases that are increasingly important for advancing biological research. Because bioinformatics programs are used to analyze, visualize, and/or compare biological data, it is likely that the use of these programs will have a positive impact on biology…

  2. 77 FR 36543 - General Services Administration Acquisition Regulation (GSAR) Part 523; Information Collection...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-06-19

    ... handling of such items throughout GSA's supply chain system. The information is used in GSA warehouses, stored in an NSN database and provided to GSA customers. Non-Collection and/or a less frequently... any of the following methods: Regulations.gov : http://www.regulations.gov . Submit comments via the...

  3. Data, Data Everywhere--Not a Report in Sight!

    ERIC Educational Resources Information Center

    Norman, Wendy

    2003-01-01

    Presents six steps of data warehouse development that result in valuable, long-term reporting solutions, discussing how to choose the right reporting vehicle. The six steps are: defining one's needs; mapping the source for each element; extracting the data; cleaning and verifying the data; moving the data into a relational database; and developing…

  4. Garbage in, Garbage Stays: How ERPs Could Improve Our Data-Quality Issues

    ERIC Educational Resources Information Center

    Riccardi, Richard I.

    2009-01-01

    As universities begin to implement business intelligence tools such as end-user reporting, data warehousing, and dashboard indicators, data quality becomes an even greater and more public issue. With automated tools taking nightly snapshots of the database, the faulty data grow exponentially, propagating as another layer of the data warehouse.…

  5. Data Foundry: Data Warehousing and Integration for Scientific Data Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Musick, R.; Critchlow, T.; Ganesh, M.

    2000-02-29

    Data warehousing is an approach for managing data from multiple sources by representing them with a single, coherent point of view. Commercial data warehousing products have been produced by companies such as RebBrick, IBM, Brio, Andyne, Ardent, NCR, Information Advantage, Informatica, and others. Other companies have chosen to develop their own in-house data warehousing solution using relational databases, such as those sold by Oracle, IBM, Informix and Sybase. The typical approaches include federated systems, and mediated data warehouses, each of which, to some extent, makes use of a series of source-specific wrapper and mediator layers to integrate the data intomore » a consistent format which is then presented to users as a single virtual data store. These approaches are successful when applied to traditional business data because the data format used by the individual data sources tends to be rather static. Therefore, once a data source has been integrated into a data warehouse, there is relatively little work required to maintain that connection. However, that is not the case for all data sources. Data sources from scientific domains tend to regularly change their data model, format and interface. This is problematic because each change requires the warehouse administrator to update the wrapper, mediator, and warehouse interfaces to properly read, interpret, and represent the modified data source. Furthermore, the data that scientists require to carry out research is continuously changing as their understanding of a research question develops, or as their research objectives evolve. The difficulty and cost of these updates effectively limits the number of sources that can be integrated into a single data warehouse, or makes an approach based on warehousing too expensive to consider.« less

  6. India's Computational Biology Growth and Challenges.

    PubMed

    Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Agoramoorthy, Govindasamy

    2016-09-01

    India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.

  7. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  8. CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks

    PubMed Central

    Baumbach, Jan

    2007-01-01

    Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative contradictions or further gene regulatory interactions. Furthermore, it integrates protein clusters by means of heuristically solving the weighted graph cluster editing problem. In addition, it provides Web Service based access to up to date gene annotation data from GenDB. Conclusion The release 4.0 of CoryneRegNet is a comprehensive system for the integrated analysis of procaryotic gene regulatory networks. It is a versatile systems biology platform to support the efficient and large-scale analysis of transcriptional regulation of gene expression in microorganisms. It is publicly available at . PMID:17986320

  9. blastjs: a BLAST+ wrapper for Node.js.

    PubMed

    Page, Martin; MacLean, Dan; Schudoma, Christian

    2016-02-27

    To cope with the ever-increasing amount of sequence data generated in the field of genomics, the demand for efficient and fast database searches that drive functional and structural annotation in both large- and small-scale genome projects is on the rise. The tools of the BLAST+ suite are the most widely employed bioinformatic method for these database searches. Recent trends in bioinformatics application development show an increasing number of JavaScript apps that are based on modern frameworks such as Node.js. Until now, there is no way of using database searches with the BLAST+ suite from a Node.js codebase. We developed blastjs, a Node.js library that wraps the search tools of the BLAST+ suite and thus allows to easily add significant functionality to any Node.js-based application. blastjs is a library that allows the incorporation of BLAST+ functionality into bioinformatics applications based on JavaScript and Node.js. The library was designed to be as user-friendly as possible and therefore requires only a minimal amount of code in the client application. The library is freely available under the MIT license at https://github.com/teammaclean/blastjs.

  10. Applications and Methods Utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for Bioinformatics Resource Discovery and Disparate Data and Service Integration

    USDA-ARS?s Scientific Manuscript database

    Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of scientific data between information resources difficu...

  11. Why Choose This One? Factors in Scientists' Selection of Bioinformatics Tools

    ERIC Educational Resources Information Center

    Bartlett, Joan C.; Ishimura, Yusuke; Kloda, Lorie A.

    2011-01-01

    Purpose: The objective was to identify and understand the factors involved in scientists' selection of preferred bioinformatics tools, such as databases of gene or protein sequence information (e.g., GenBank) or programs that manipulate and analyse biological data (e.g., BLAST). Methods: Eight scientists maintained research diaries for a two-week…

  12. Bioinformatics for spermatogenesis: annotation of male reproduction based on proteomics

    PubMed Central

    Zhou, Tao; Zhou, Zuo-Min; Guo, Xue-Jiang

    2013-01-01

    Proteomics strategies have been widely used in the field of male reproduction, both in basic and clinical research. Bioinformatics methods are indispensable in proteomics-based studies and are used for data presentation, database construction and functional annotation. In the present review, we focus on the functional annotation of gene lists obtained through qualitative or quantitative methods, summarizing the common and male reproduction specialized proteomics databases. We introduce several integrated tools used to find the hidden biological significance from the data obtained. We further describe in detail the information on male reproduction derived from Gene Ontology analyses, pathway analyses and biomedical analyses. We provide an overview of bioinformatics annotations in spermatogenesis, from gene function to biological function and from biological function to clinical application. On the basis of recently published proteomics studies and associated data, we show that bioinformatics methods help us to discover drug targets for sperm motility and to scan for cancer-testis genes. In addition, we summarize the online resources relevant to male reproduction research for the exploration of the regulation of spermatogenesis. PMID:23852026

  13. Broad issues to consider for library involvement in bioinformatics*

    PubMed Central

    Geer, Renata C.

    2006-01-01

    Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662

  14. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  15. A Web-based assessment of bioinformatics end-user support services at US universities.

    PubMed

    Messersmith, Donna J; Benson, Dennis A; Geer, Renata C

    2006-07-01

    This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.

  16. The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.

    PubMed

    Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi

    2005-04-15

    Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.

  17. On patterns and re-use in bioinformatics databases.

    PubMed

    Bell, Michael J; Lord, Phillip

    2017-09-01

    As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Analytical software is available on request. phillip.lord@newcastle.ac.uk. © The Author(s) 2017. Published by Oxford University Press.

  18. On patterns and re-use in bioinformatics databases

    PubMed Central

    Bell, Michael J.; Lord, Phillip

    2017-01-01

    Abstract Motivation: As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. Results: We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Availability and implementation: Analytical software is available on request. Contact: phillip.lord@newcastle.ac.uk PMID:28525546

  19. DSSTOX STRUCTURE-SEARCHABLE PUBLIC TOXICITY DATABASE NETWORK: CURRENT PROGRESS AND NEW INITIATIVES TO IMPROVE CHEMO-BIOINFORMATICS CAPABILITIES

    EPA Science Inventory

    The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...

  20. Navigating legal constraints in clinical data warehousing: a case study in personalized medicine.

    PubMed

    Jefferys, Benjamin R; Nwankwo, Iheanyi; Neri, Elias; Chang, David C W; Shamardin, Lev; Hänold, Stefanie; Graf, Norbert; Forgó, Nikolaus; Coveney, Peter

    2013-04-06

    Personalized medicine relies in part upon comprehensive data on patient treatment and outcomes, both for analysis leading to improved models that provide the basis for enhanced treatment, and for direct use in clinical decision-making. A data warehouse is an information technology for combining and standardizing multiple databases. Data warehousing of clinical data is constrained by many legal and ethical considerations, owing to the sensitive nature of the data being stored. We describe an unconstrained clinical data warehousing architecture, some of the legal constraints that have led us to reconsider this architecture, and the legal and technical solutions to these constraints developed for the clinical data warehouse in the personalized medicine project p-medicine. We also propose some changes to the legal constraints that will further enable clinical research.

  1. A Realistic Data Warehouse Project: An Integration of Microsoft Access[R] and Microsoft Excel[R] Advanced Features and Skills

    ERIC Educational Resources Information Center

    King, Michael A.

    2009-01-01

    Business intelligence derived from data warehousing and data mining has become one of the most strategic management tools today, providing organizations with long-term competitive advantages. Business school curriculums and popular database textbooks cover data warehousing, but the examples and problem sets typically are small and unrealistic. The…

  2. Propagation from the Start: The Spread of a Concept-Based Instructional Tool

    ERIC Educational Resources Information Center

    Friedrichsen, Debra M.; Smith, Christina; Koretsky, Milo D.

    2017-01-01

    We describe the propagation of a technology-based educational innovation through its first 3 years of public use. The innovation studied is the Concept Warehouse (CW), a database-driven website developed to support the use of concept-based pedagogies. This tool was initially developed for instructors in undergraduate chemical engineering courses,…

  3. Incorporating a New Bioinformatics Component into Genetics at a Historically Black College: Outcomes and Lessons

    ERIC Educational Resources Information Center

    Holtzclaw, J. David; Eisen, Arri; Whitney, Erika M.; Penumetcha, Meera; Hoey, J. Joseph; Kimbro, K. Sean

    2006-01-01

    Many students at minority-serving institutions are underexposed to Internet resources such as the human genome project, PubMed, NCBI databases, and other Web-based technologies because of a lack of financial resources. To change this, we designed and implemented a new bioinformatics component to supplement the undergraduate Genetics course at…

  4. Data warehousing with Oracle

    NASA Astrophysics Data System (ADS)

    Shahzad, Muhammad A.

    1999-02-01

    With the emergence of data warehousing, Decision support systems have evolved to its best. At the core of these warehousing systems lies a good database management system. Database server, used for data warehousing, is responsible for providing robust data management, scalability, high performance query processing and integration with other servers. Oracle being the initiator in warehousing servers, provides a wide range of features for facilitating data warehousing. This paper is designed to review the features of data warehousing - conceptualizing the concept of data warehousing and, lastly, features of Oracle servers for implementing a data warehouse.

  5. Bioinformatic training needs at a health sciences campus.

    PubMed

    Oliver, Jeffrey C

    2017-01-01

    Health sciences research is increasingly focusing on big data applications, such as genomic technologies and precision medicine, to address key issues in human health. These approaches rely on biological data repositories and bioinformatic analyses, both of which are growing rapidly in size and scope. Libraries play a key role in supporting researchers in navigating these and other information resources. With the goal of supporting bioinformatics research in the health sciences, the University of Arizona Health Sciences Library established a Bioinformation program. To shape the support provided by the library, I developed and administered a needs assessment survey to the University of Arizona Health Sciences campus in Tucson, Arizona. The survey was designed to identify the training topics of interest to health sciences researchers and the preferred modes of training. Survey respondents expressed an interest in a broad array of potential training topics, including "traditional" information seeking as well as interest in analytical training. Of particular interest were training in transcriptomic tools and the use of databases linking genotypes and phenotypes. Staff were most interested in bioinformatics training topics, while faculty were the least interested. Hands-on workshops were significantly preferred over any other mode of training. The University of Arizona Health Sciences Library is meeting those needs through internal programming and external partnerships. The results of the survey demonstrate a keen interest in a variety of bioinformatic resources; the challenge to the library is how to address those training needs. The mode of support depends largely on library staff expertise in the numerous subject-specific databases and tools. Librarian-led bioinformatic training sessions provide opportunities for engagement with researchers at multiple points of the research life cycle. When training needs exceed library capacity, partnering with intramural and extramural units will be crucial in library support of health sciences bioinformatic research.

  6. Medical data mining: knowledge discovery in a clinical data warehouse.

    PubMed Central

    Prather, J. C.; Lobach, D. F.; Goodwin, L. K.; Hales, J. W.; Hage, M. L.; Hammond, W. E.

    1997-01-01

    Clinical databases have accumulated large quantities of information about patients and their medical conditions. Relationships and patterns within this data could provide new medical knowledge. Unfortunately, few methodologies have been developed and applied to discover this hidden knowledge. In this study, the techniques of data mining (also known as Knowledge Discovery in Databases) were used to search for relationships in a large clinical database. Specifically, data accumulated on 3,902 obstetrical patients were evaluated for factors potentially contributing to preterm birth using exploratory factor analysis. Three factors were identified by the investigators for further exploration. This paper describes the processes involved in mining a clinical database including data warehousing, data query and cleaning, and data analysis. PMID:9357597

  7. Educational websites--Bioinformatics Tools II.

    PubMed

    Lomberk, Gwen

    2009-01-01

    In this issue, the highlighted websites are a continuation of a series of educational websites; this one in particular from a couple of years ago, Bioinformatics Tools [Pancreatology 2005;5:314-315]. These include sites that are valuable resources for many research needs in genomics and proteomics. Bioinformatics has become a laboratory tool to map sequences to databases, develop models of molecular interactions, evaluate structural compatibilities, describe differences between normal and disease-associated DNA, identify conserved motifs within proteins, and chart extensive signaling networks, all in silico. Copyright 2008 S. Karger AG, Basel and IAP.

  8. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    PubMed

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.

  9. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework

    PubMed Central

    Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145

  10. Electronic medical record: research tool for pancreatic cancer?

    PubMed

    Arous, Edward J; McDade, Theodore P; Smith, Jillian K; Ng, Sing Chau; Sullivan, Mary E; Zottola, Ralph J; Ranauro, Paul J; Shah, Shimul A; Whalen, Giles F; Tseng, Jennifer F

    2014-04-01

    A novel data warehouse based on automated retrieval from an institutional health care information system (HIS) was made available to be compared with a traditional prospectively maintained surgical database. A newly established institutional data warehouse at a single-institution academic medical center autopopulated by HIS was queried for International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) diagnosis codes for pancreatic neoplasm. Patients with ICD-9-CM diagnosis codes for pancreatic neoplasm were captured. A parallel query was performed using a prospective database populated by manual entry. Duplicated patients and those unique to either data set were identified. All patients were manually reviewed to determine the accuracy of diagnosis. A total of 1107 patients were identified from the HIS-linked data set with pancreatic neoplasm from 1999-2009. Of these, 254 (22.9%) patients were also captured by the surgical database, whereas 853 (77.1%) patients were only in the HIS-linked data set. Manual review of the HIS-only group demonstrated that 45.0% of patients were without identifiable pancreatic pathology, suggesting erroneous capture, whereas 36.3% of patients were consistent with pancreatic neoplasm and 18.7% with other pancreatic pathology. Of the 394 patients identified by the surgical database, 254 (64.5%) patients were captured by HIS, whereas 140 (35.5%) patients were not. Manual review of patients only captured by the surgical database demonstrated 85.9% with pancreatic neoplasm and 14.1% with other pancreatic pathology. Finally, review of the 254 patient overlap demonstrated that 80.3% of patients had pancreatic neoplasm and 19.7% had other pancreatic pathology. These results suggest that cautious interpretation of administrative data rely only on ICD-9-CM diagnosis codes and clinical correlation through previously validated mechanisms. Published by Elsevier Inc.

  11. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*.

    PubMed

    Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru; Ono, Keiichiro; Aoki-Kinoshita, Kiyoko F; Yamamoto, Yasunori; Yamaguchi, Atsuko; Kawashima, Shuichi; Chun, Hong-Woo; Aerts, Jan; Aranda, Bruno; Barboza, Lord Hendrix; Bonnal, Raoul Jp; Bruskiewich, Richard; Bryne, Jan C; Fernández, José M; Funahashi, Akira; Gordon, Paul Mk; Goto, Naohisa; Groscurth, Andreas; Gutteridge, Alex; Holland, Richard; Kano, Yoshinobu; Kawas, Edward A; Kerhornou, Arnaud; Kibukawa, Eri; Kinjo, Akira R; Kuhn, Michael; Lapp, Hilmar; Lehvaslaiho, Heikki; Nakamura, Hiroyuki; Nakamura, Yasukazu; Nishizawa, Tatsuya; Nobata, Chikashi; Noguchi, Tamotsu; Oinn, Thomas M; Okamoto, Shinobu; Owen, Stuart; Pafilis, Evangelos; Pocock, Matthew; Prins, Pjotr; Ranzinger, René; Reisinger, Florian; Salwinski, Lukasz; Schreiber, Mark; Senger, Martin; Shigemoto, Yasumasa; Standley, Daron M; Sugawara, Hideaki; Tashiro, Toshiyuki; Trelles, Oswaldo; Vos, Rutger A; Wilkinson, Mark D; York, William; Zmasek, Christian M; Asai, Kiyoshi; Takagi, Toshihisa

    2010-08-21

    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.

  12. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

    PubMed Central

    2010-01-01

    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies. PMID:20727200

  13. Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study.

    PubMed

    Chen, Qingyu; Zobel, Justin; Verspoor, Karin

    2017-01-01

    GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC-a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases - in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases.Database URL: the merged records are available at https://cloudstor.aarnet.edu.au/plus/index.php/s/Xef2fvsebBEAv9w. © The Author(s) 2017. Published by Oxford University Press.

  14. Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study

    PubMed Central

    Chen, Qingyu; Zobel, Justin; Verspoor, Karin

    2017-01-01

    GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC—a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases – in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases. Database URL: the merged records are available at https://cloudstor.aarnet.edu.au/plus/index.php/s/Xef2fvsebBEAv9w PMID:28077566

  15. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Electronic warehouse receipts. 735.303 Section 735.303... AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT Warehouse Receipts § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  16. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Electronic warehouse receipts. 735.303 Section 735.303... AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT Warehouse Receipts § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  17. Development of a Web-Enabled Informatics Platform for Manipulation of Gene Expression Data

    DTIC Science & Technology

    2004-12-01

    genomic platforms such as metabolomics and proteomics , and to federated databases for knowledge management. A successful SBIR Phase I completed...measurements that require sophisticated bioinformatic platforms for data archival, management, integration, and analysis if researchers are to derive...web-enabled bioinformatic platform consisting of a Laboratory Information Management System (LIMS), an Analysis Information Management System (AIMS

  18. toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research

    PubMed Central

    Rhee, David B.; Croken, Matthew McKnight; Shieh, Kevin R.; Sullivan, Julie; Micklem, Gos; Kim, Kami; Golden, Aaron

    2015-01-01

    Toxoplasma gondii (T. gondii) is an obligate intracellular parasite that must monitor for changes in the host environment and respond accordingly; however, it is still not fully known which genetic or epigenetic factors are involved in regulating virulence traits of T. gondii. There are on-going efforts to elucidate the mechanisms regulating the stage transition process via the application of high-throughput epigenomics, genomics and proteomics techniques. Given the range of experimental conditions and the typical yield from such high-throughput techniques, a new challenge arises: how to effectively collect, organize and disseminate the generated data for subsequent data analysis. Here, we describe toxoMine, which provides a powerful interface to support sophisticated integrative exploration of high-throughput experimental data and metadata, providing researchers with a more tractable means toward understanding how genetic and/or epigenetic factors play a coordinated role in determining pathogenicity of T. gondii. As a data warehouse, toxoMine allows integration of high-throughput data sets with public T. gondii data. toxoMine is also able to execute complex queries involving multiple data sets with straightforward user interaction. Furthermore, toxoMine allows users to define their own parameters during the search process that gives users near-limitless search and query capabilities. The interoperability feature also allows users to query and examine data available in other InterMine systems, which would effectively augment the search scope beyond what is available to toxoMine. toxoMine complements the major community database ToxoDB by providing a data warehouse that enables more extensive integrative studies for T. gondii. Given all these factors, we believe it will become an indispensable resource to the greater infectious disease research community. Database URL: http://toxomine.org PMID:26130662

  19. A substitution method to improve completeness of events documentation in anesthesia records.

    PubMed

    Lamer, Antoine; De Jonckheere, Julien; Marcilly, Romaric; Tavernier, Benoît; Vallet, Benoît; Jeanne, Mathieu; Logier, Régis

    2015-12-01

    AIMS are optimized to find and display data and curves about one specific intervention but is not retrospective analysis on a huge volume of interventions. Such a system present two main limitation; (1) the transactional database architecture, (2) the completeness of documentation. In order to solve the architectural problem, data warehouses were developed to propose architecture suitable for analysis. However, completeness of documentation stays unsolved. In this paper, we describe a method which allows determining of substitution rules in order to detect missing anesthesia events in an anesthesia record. Our method is based on the principle that missing event could be detected using a substitution one defined as the nearest documented event. As an example, we focused on the automatic detection of the start and the end of anesthesia procedure when these events were not documented by the clinicians. We applied our method on a set of records in order to evaluate; (1) the event detection accuracy, (2) the improvement of valid records. For the year 2010-2012, we obtained event detection with a precision of 0.00 (-2.22; 2.00) min for the start of anesthesia and 0.10 (0.00; 0.35) min for the end of anesthesia. On the other hand, we increased by 21.1% the data completeness (from 80.3 to 97.2% of the total database) for the start and the end of anesthesia events. This method seems to be efficient to replace missing "start and end of anesthesia" events. This method could also be used to replace other missing time events in this particular data warehouse as well as in other kind of data warehouses.

  20. The eBioKit, a stand-alone educational platform for bioinformatics.

    PubMed

    Hernández-de-Diego, Rafael; de Villiers, Etienne P; Klingström, Tomas; Gourlé, Hadrien; Conesa, Ana; Bongcam-Rudloff, Erik

    2017-09-01

    Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative.

  1. The eBioKit, a stand-alone educational platform for bioinformatics

    PubMed Central

    Conesa, Ana; Bongcam-Rudloff, Erik

    2017-01-01

    Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative. PMID:28910280

  2. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  3. Moving from Descriptive to Causal Analytics: Case Study of the Health Indicators Warehouse

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schryver, Jack C.; Shankar, Mallikarjun; Xu, Songhua

    The KDD community has described a multitude of methods for knowledge discovery on large datasets. We consider some of these methods and integrate them into an analyst s workflow that proceeds from the data-centric descriptive level to the model-centric causal level. Examples of the workflow are shown for the Health Indicators Warehouse, which is a public database for community health information that is a potent resource for conducting data science on a medium scale. We demonstrate the potential of HIW as a source of serious visual analytics efforts by showing correlation matrix visualizations, multivariate outlier analysis, multiple linear regression ofmore » Medicare costs, and scatterplot matrices for a broad set of health indicators. We conclude by sketching the first steps toward a causal dependence hypothesis.« less

  4. Hierarchical content-based image retrieval by dynamic indexing and guided search

    NASA Astrophysics Data System (ADS)

    You, Jane; Cheung, King H.; Liu, James; Guo, Linong

    2003-12-01

    This paper presents a new approach to content-based image retrieval by using dynamic indexing and guided search in a hierarchical structure, and extending data mining and data warehousing techniques. The proposed algorithms include: a wavelet-based scheme for multiple image feature extraction, the extension of a conventional data warehouse and an image database to an image data warehouse for dynamic image indexing, an image data schema for hierarchical image representation and dynamic image indexing, a statistically based feature selection scheme to achieve flexible similarity measures, and a feature component code to facilitate query processing and guide the search for the best matching. A series of case studies are reported, which include a wavelet-based image color hierarchy, classification of satellite images, tropical cyclone pattern recognition, and personal identification using multi-level palmprint and face features.

  5. Navigating legal constraints in clinical data warehousing: a case study in personalized medicine

    PubMed Central

    Jefferys, Benjamin R.; Nwankwo, Iheanyi; Neri, Elias; Chang, David C. W.; Shamardin, Lev; Hänold, Stefanie; Graf, Norbert; Forgó, Nikolaus; Coveney, Peter

    2013-01-01

    Personalized medicine relies in part upon comprehensive data on patient treatment and outcomes, both for analysis leading to improved models that provide the basis for enhanced treatment, and for direct use in clinical decision-making. A data warehouse is an information technology for combining and standardizing multiple databases. Data warehousing of clinical data is constrained by many legal and ethical considerations, owing to the sensitive nature of the data being stored. We describe an unconstrained clinical data warehousing architecture, some of the legal constraints that have led us to reconsider this architecture, and the legal and technical solutions to these constraints developed for the clinical data warehouse in the personalized medicine project p-medicine. We also propose some changes to the legal constraints that will further enable clinical research. PMID:24427531

  6. A database for coconut crop improvement.

    PubMed

    Rajagopal, Velamoor; Manimekalai, Ramaswamy; Devakumar, Krishnamurthy; Rajesh; Karun, Anitha; Niral, Vittal; Gopal, Murali; Aziz, Shamina; Gunasekaran, Marimuthu; Kumar, Mundappurathe Ramesh; Chandrasekar, Arumugam

    2005-12-08

    Coconut crop improvement requires a number of biotechnology and bioinformatics tools. A database containing information on CG (coconut germplasm), CCI (coconut cultivar identification), CD (coconut disease), MIFSPC (microbial information systems in plantation crops) and VO (vegetable oils) is described. The database was developed using MySQL and PostgreSQL running in Linux operating system. The database interface is developed in PHP, HTML and JAVA. http://www.bioinfcpcri.org.

  7. Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

    PubMed

    Ezra Tsur, Elishai

    2017-01-01

    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.

  8. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  9. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    PubMed

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  10. Ontology based heterogeneous materials database integration and semantic query

    NASA Astrophysics Data System (ADS)

    Zhao, Shuai; Qian, Quan

    2017-10-01

    Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.

  11. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

    PubMed

    Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  12. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

    PubMed Central

    Cheng, Gong; Zhang, Guocai; Xu, Liang

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317

  13. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  14. The Visit-Data Warehouse: Enabling Novel Secondary Use of Health Information Exchange Data

    PubMed Central

    Fleischman, William; Lowry, Tina; Shapiro, Jason

    2014-01-01

    Introduction/Objectives: Health Information Exchange (HIE) efforts face challenges with data quality and performance, and this becomes especially problematic when data is leveraged for uses beyond primary clinical use. We describe a secondary data infrastructure focusing on patient-encounter, nonclinical data that was built on top of a functioning HIE platform to support novel secondary data uses and prevent potentially negative impacts these uses might have otherwise had on HIE system performance. Background: HIE efforts have generally formed for the primary clinical use of individual clinical providers searching for data on individual patients under their care, but many secondary uses have been proposed and are being piloted to support care management, quality improvement, and public health. Description of the HIE and Base Infrastructure: This infrastructure review describes a module built into the Healthix HIE. Healthix, based in the New York metropolitan region, comprises 107 participating organizations with 29,946 acute-care beds in 383 facilities, and includes more than 9.2 million unique patients. The primary infrastructure is based on the InterSystems proprietary Caché data model distributed across servers in multiple locations, and uses a master patient index to link individual patients’ records across multiple sites. We built a parallel platform, the “visit data warehouse,” of patient encounter data (demographics, date, time, and type of visit) using a relational database model to allow accessibility using standard database tools and flexibility for developing secondary data use cases. These four secondary use cases include the following: (1) tracking encounter-based metrics in a newly established geriatric emergency department (ED), (2) creating a dashboard to provide a visual display as well as a tabular output of near-real-time de-identified encounter data from the data warehouse, (3) tracking frequent ED users as part of a regional-approach to case management intervention, and (4) improving an existing quality improvement program that analyzes patients with return visits to EDs within 72 hours of discharge. Results/Lessons Learned: Setting up a separate, near-real-time, encounters-based relational database to complement an HIE built on a hierarchical database is feasible, and may be necessary to support many secondary uses of HIE data. As of November 2014, the visit-data warehouse (VDW) built by Healthix is undergoing technical validation testing and updates on an hourly basis. We had to address data integrity issues with both nonstandard and missing HL7 messages because of varied HL7 implementation across the HIE. Also, given our HIEs federated structure, some sites expressed concerns regarding data centralization for the VDW. An established and stable HIE governance structure was critical in overcoming this initial reluctance. Conclusions: As secondary use of HIE data becomes more prevalent, it may be increasingly necessary to build separate infrastructure to support secondary use without compromising performance. More research is needed to determine optimal ways of building such infrastructure and validating its use for secondary purposes. PMID:25848595

  15. A weight based genetic algorithm for selecting views

    NASA Astrophysics Data System (ADS)

    Talebian, Seyed H.; Kareem, Sameem A.

    2013-03-01

    Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.

  16. Genomics for Everyone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick

    Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.

  17. Data warehouse model design technology analysis and research

    NASA Astrophysics Data System (ADS)

    Jiang, Wenhua; Li, Qingshui

    2012-01-01

    Existing data storage format can not meet the needs of information analysis, data warehouse onto the historical stage, the data warehouse is to support business decision making and the creation of specially designed data collection. With the data warehouse, the companies will all collected information is stored in the data warehouse. The data warehouse is organized according to some, making information easy to access and has value. This paper focuses on the establishment of data warehouse and analysis, design, data warehouse, two barrier models, and compares them.

  18. A database for coconut crop improvement

    PubMed Central

    Rajagopal, Velamoor; Manimekalai, Ramaswamy; Devakumar, Krishnamurthy; Rajesh; Karun, Anitha; Niral, Vittal; Gopal, Murali; Aziz, Shamina; Gunasekaran, Marimuthu; Kumar, Mundappurathe Ramesh; Chandrasekar, Arumugam

    2005-01-01

    Coconut crop improvement requires a number of biotechnology and bioinformatics tools. A database containing information on CG (coconut germplasm), CCI (coconut cultivar identification), CD (coconut disease), MIFSPC (microbial information systems in plantation crops) and VO (vegetable oils) is described. The database was developed using MySQL and PostgreSQL running in Linux operating system. The database interface is developed in PHP, HTML and JAVA. Availability http://www.bioinfcpcri.org PMID:17597858

  19. Use of electronic medical record data for quality improvement in schizophrenia treatment.

    PubMed

    Owen, Richard R; Thrush, Carol R; Cannon, Dale; Sloan, Kevin L; Curran, Geoff; Hudson, Teresa; Austen, Mark; Ritchie, Mona

    2004-01-01

    An understanding of the strengths and limitations of automated data is valuable when using administrative or clinical databases to monitor and improve the quality of health care. This study discusses the feasibility and validity of using data electronically extracted from the Veterans Health Administration (VHA) computer database (VistA) to monitor guideline performance for inpatient and outpatient treatment of schizophrenia. The authors also discuss preliminary results and their experience in applying these methods to monitor antipsychotic prescribing using the South Central VA Healthcare Network (SCVAHCN) Data Warehouse as a tool for quality improvement.

  20. Protein Simulation Data in the Relational Model.

    PubMed

    Simms, Andrew M; Daggett, Valerie

    2012-10-01

    High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost-significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server.

  1. Protein Simulation Data in the Relational Model

    PubMed Central

    Simms, Andrew M.; Daggett, Valerie

    2011-01-01

    High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost—significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server. PMID:23204646

  2. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.

  3. ExPASy: SIB bioinformatics resource portal.

    PubMed

    Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz

    2012-07-01

    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.

  4. AllerML: markup language for allergens.

    PubMed

    Ivanciuc, Ovidiu; Gendel, Steven M; Power, Trevor D; Schein, Catherine H; Braun, Werner

    2011-06-01

    Many concerns have been raised about the potential allergenicity of novel, recombinant proteins into food crops. Guidelines, proposed by WHO/FAO and EFSA, include the use of bioinformatics screening to assess the risk of potential allergenicity or cross-reactivities of all proteins introduced, for example, to improve nutritional value or promote crop resistance. However, there are no universally accepted standards that can be used to encode data on the biology of allergens to facilitate using data from multiple databases in this screening. Therefore, we developed AllerML a markup language for allergens to assist in the automated exchange of information between databases and in the integration of the bioinformatics tools that are used to investigate allergenicity and cross-reactivity. As proof of concept, AllerML was implemented using the Structural Database of Allergenic Proteins (SDAP; http://fermi.utmb.edu/SDAP/) database. General implementation of AllerML will promote automatic flow of validated data that will aid in allergy research and regulatory analysis. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. AllerML: Markup Language for Allergens

    PubMed Central

    Ivanciuc, Ovidiu; Gendel, Steven M.; Power, Trevor D.; Schein, Catherine H.; Braun, Werner

    2011-01-01

    Many concerns have been raised about the potential allergenicity of novel, recombinant proteins into food crops. Guidelines, proposed by WHO/FAO and EFSA, include the use of bioinformatics screening to assess the risk of potential allergenicity or cross-reactivities of all proteins introduced, for example, to improve nutritional value or promote crop resistance. However, there are no universally accepted standards that can be used to encode data on the biology of allergens to facilitate using data from multiple databases in this screening. Therefore, we developed AllerML a markup language for allergens to assist in the automated exchange of information between databases and in the integration of the bioinformatics tools that are used to investigate allergenicity and cross-reactivity. As proof of concept, AllerML was implemented using the Structural Database of Allergenic Proteins (SDAP; http://fermi.utmb.edu/SDAP/) database. General implementation of AllerML will promote automatic flow of validated data that will aid in allergy research and regulatory analysis. PMID:21420460

  6. Designing a framework of intelligent information processing for dentistry administration data.

    PubMed

    Amiri, N; Matthews, D C; Gao, Q

    2005-07-01

    This study was designed to test a cumulative view of current data in the clinical database at the Faculty of Dentistry, Dalhousie University. We planned to examine associations among demographic factors and treatments. Three tables were selected from the database of the faculty: patient, treatment and procedures. All fields and record numbers in each table were documented. Data was explored using SQL server and Visual Basic and then cleaned by removing incongruent fields. After transformation, a data warehouse was created. This was imported to SQL analysis services manager to create an OLAP (Online Analytic Process) cube. The multidimensional model used for access to data was created using a star schema. Treatment count was the measurement variable. Five dimensions--date, postal code, gender, age group and treatment categories--were used to detect associations. Another data warehouse of 8 tables (international tooth code # 1-8) was created and imported to SAS enterprise miner to complete data mining. Association nodes were used for each table to find sequential associations and minimum criteria were set to 2% of cases. Findings of this study confirmed most assumptions of treatment planning procedures. There were some small unexpected patterns of clinical interest. Further developments are recommended to create predictive models. Recent improvements in information technology offer numerous advantages for conversion of raw data from faculty databases to information and subsequently to knowledge. This knowledge can be used by decision makers, managers, and researchers to answer clinical questions, affect policy change and determine future research needs.

  7. High Performance Analytics with the R3-Cache

    NASA Astrophysics Data System (ADS)

    Eavis, Todd; Sayeed, Ruhan

    Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.

  8. Scale out databases for CERN use cases

    NASA Astrophysics Data System (ADS)

    Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper

    2015-12-01

    Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.

  9. Genomics for Everyone

    ScienceCinema

    Chain, Patrick

    2018-05-31

    Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.

  10. Evolving from bioinformatics in-the-small to bioinformatics in-the-large.

    PubMed

    Parker, D Stott; Gorlick, Michael M; Lee, Christopher J

    2003-01-01

    We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.

  11. Application of bioinformatics tools and databases in microbial dehalogenation research (a review).

    PubMed

    Satpathy, R; Konkimalla, V B; Ratha, J

    2015-01-01

    Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.

  12. Information integration for a sky survey by data warehousing

    NASA Astrophysics Data System (ADS)

    Luo, A.; Zhang, Y.; Zhao, Y.

    The virtualization service of data system for a sky survey LAMOST is very important for astronomers The service needs to integrate information from data collections catalogs and references and support simple federation of a set of distributed files and associated metadata Data warehousing has been in existence for several years and demonstrated superiority over traditional relational database management systems by providing novel indexing schemes that supported efficient on-line analytical processing OLAP of large databases Now relational database systems such as Oracle etc support the warehouse capability which including extensions to the SQL language to support OLAP operations and a number of metadata management tools have been created The information integration of LAMOST by applying data warehousing is to effectively provide data and knowledge on-line

  13. Bioinformatics for Exploration

    NASA Technical Reports Server (NTRS)

    Johnson, Kathy A.

    2006-01-01

    For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.

  14. ECG-ViEW II, a freely accessible electrocardiogram database

    PubMed Central

    Park, Man Young; Lee, Sukhoon; Jeon, Min Seok; Yoon, Dukyong; Park, Rae Woong

    2017-01-01

    The Electrocardiogram Vigilance with Electronic data Warehouse II (ECG-ViEW II) is a large, single-center database comprising numeric parameter data of the surface electrocardiograms of all patients who underwent testing from 1 June 1994 to 31 July 2013. The electrocardiographic data include the test date, clinical department, RR interval, PR interval, QRS duration, QT interval, QTc interval, P axis, QRS axis, and T axis. These data are connected with patient age, sex, ethnicity, comorbidities, age-adjusted Charlson comorbidity index, prescribed drugs, and electrolyte levels. This longitudinal observational database contains 979,273 electrocardiograms from 461,178 patients over a 19-year study period. This database can provide an opportunity to study electrocardiographic changes caused by medications, disease, or other demographic variables. ECG-ViEW II is freely available at http://www.ecgview.org. PMID:28437484

  15. Flexible data integration and curation using a graph-based approach.

    PubMed

    Croset, Samuel; Rupp, Joachim; Romacker, Martin

    2016-03-15

    The increasing diversity of data available to the biomedical scientist holds promise for better understanding of diseases and discovery of new treatments for patients. In order to provide a complete picture of a biomedical question, data from many different origins needs to be combined into a unified representation. During this data integration process, inevitable errors and ambiguities present in the initial sources compromise the quality of the resulting data warehouse, and greatly diminish the scientific value of the content. Expensive and time-consuming manual curation is then required to improve the quality of the information. However, it becomes increasingly difficult to dedicate and optimize the resources for data integration projects as available repositories are growing both in size and in number everyday. We present a new generic methodology to identify problematic records, causing what we describe as 'data hairball' structures. The approach is graph-based and relies on two metrics traditionally used in social sciences: the graph density and the betweenness centrality. We evaluate and discuss these measures and show their relevance for flexible, optimized and automated data curation and linkage. The methodology focuses on information coherence and correctness to improve the scientific meaningfulness of data integration endeavors, such as knowledge bases and large data warehouses. samuel.croset@roche.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Requests for post-registration studies (PRS), patients follow-up in actual practice: Changes in the role of databases.

    PubMed

    Berdaï, Driss; Thomas-Delecourt, Florence; Szwarcensztein, Karine; d'Andon, Anne; Collignon, Cécile; Comet, Denis; Déal, Cécile; Dervaux, Benoît; Gaudin, Anne-Françoise; Lamarque-Garnier, Véronique; Lechat, Philippe; Marque, Sébastien; Maugendre, Philippe; Méchin, Hubert; Moore, Nicholas; Nachbaur, Gaëlle; Robain, Mathieu; Roussel, Christophe; Tanti, André; Thiessard, Frantz

    2018-02-01

    Early market access of health products is associated with a larger number of requests for information by the health authorities. Compared with these expectations, the growing expansion of health databases represents an opportunity for responding to questions raised by the authorities. The computerised nature of the health system provides numerous sources of data, and first and foremost medical/administrative databases such as the French National Inter-Scheme Health Insurance Information System (SNIIRAM) database. These databases, although developed for other purposes, have already been used for many years with regard to post-registration studies (PRS). The use thereof will continue to increase with the recent creation of the French National Health Data System (SNDS [2016 health system reform law]). At the same time, other databases are available in France, offering an illustration of "product use under actual practice conditions" by patients and health professionals (cohorts, specific registries, data warehouses, etc.). Based on a preliminary analysis of requests for PRS, approximately two-thirds appeared to have found at least a partial response in existing databases. Using these databases has a number of disadvantages, but also numerous advantages, which are listed. In order to facilitate access and optimise their use, it seemed important to draw up recommendations aiming to facilitate these developments and guarantee the conditions for their technical validity. The recommendations drawn up notably include the need for measures aiming to promote the visibility of research conducted on databases in the field of PRS. Moreover, it seemed worthwhile to promote the interoperability of health data warehouses, to make it possible to match information originating from field studies with information originating from databases, and to develop and share algorithms aiming to identify criteria of interest (proxies). Methodological documents, such as the French National Authority for Health (HAS) recommendations on "Les études post-inscription sur les technologies de santé (médicaments, dispositifs médicaux et actes). Principes et méthodes" [Post-registration studies on health technologies (medicinal products, medical devices and procedures). Principles and methods] should be updated to incorporate these developments. Copyright © 2018 Société française de pharmacologie et de thérapeutique. Published by Elsevier Masson SAS. All rights reserved.

  17. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

    PubMed Central

    Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

    2012-01-01

    Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background  The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design  The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results  To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions  The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278

  18. GENEASE: Real time bioinformatics tool for multi-omics and disease ontology exploration, analysis and visualization.

    PubMed

    Ghandikota, Sudhir; Hershey, Gurjit K Khurana; Mersha, Tesfaye B

    2018-03-24

    Advances in high-throughput sequencing technologies have made it possible to generate multiple omics data at an unprecedented rate and scale. The accumulation of these omics data far outpaces the rate at which biologists can mine and generate new hypothesis to test experimentally. There is an urgent need to develop a myriad of powerful tools to efficiently and effectively search and filter these resources to address specific post-GWAS functional genomics questions. However, to date, these resources are scattered across several databases and often lack a unified portal for data annotation and analytics. In addition, existing tools to analyze and visualize these databases are highly fragmented, resulting researchers to access multiple applications and manual interventions for each gene or variant in an ad hoc fashion until all the questions are answered. In this study, we present GENEASE, a web-based one-stop bioinformatics tool designed to not only query and explore multi-omics and phenotype databases (e.g., GTEx, ClinVar, dbGaP, GWAS Catalog, ENCODE, Roadmap Epigenomics, KEGG, Reactome, Gene and Phenotype Ontology) in a single web interface but also to perform seamless post genome-wide association downstream functional and overlap analysis for non-coding regulatory variants. GENEASE accesses over 50 different databases in public domain including model organism-specific databases to facilitate gene/variant and disease exploration, enrichment and overlap analysis in real time. It is a user-friendly tool with point-and-click interface containing links for support information including user manual and examples. GENEASE can be accessed freely at http://research.cchmc.org/mershalab/genease_new/login.html. Tesfaye.Mersha@cchmc.org, Sudhir.Ghandikota@cchmc.org. Supplementary data are available at Bioinformatics online.

  19. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial

    PubMed Central

    Roelofs, Erik; Persoon, Lucas; Nijsten, Sebastiaan; Wiessler, Wolfgang; Dekker, André; Lambin, Philippe

    2016-01-01

    Introduction Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. Material and methods In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. Results The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p < 0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p < 0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Conclusions Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases. PMID:23394741

  20. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial.

    PubMed

    Roelofs, Erik; Persoon, Lucas; Nijsten, Sebastiaan; Wiessler, Wolfgang; Dekker, André; Lambin, Philippe

    2013-07-01

    Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p<0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p<0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  1. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192

  2. Synthesizing and databasing fossil calibrations: divergence dating and beyond

    PubMed Central

    Ksepka, Daniel T.; Benton, Michael J.; Carrano, Matthew T.; Gandolfo, Maria A.; Head, Jason J.; Hermsen, Elizabeth J.; Joyce, Walter G.; Lamm, Kristin S.; Patané, José S. L.; Phillips, Matthew J.; Polly, P. David; Van Tuinen, Marcel; Ware, Jessica L.; Warnock, Rachel C. M.; Parham, James F.

    2011-01-01

    Divergence dating studies, which combine temporal data from the fossil record with branch length data from molecular phylogenetic trees, represent a rapidly expanding approach to understanding the history of life. National Evolutionary Synthesis Center hosted the first Fossil Calibrations Working Group (3–6 March, 2011, Durham, NC, USA), bringing together palaeontologists, molecular evolutionists and bioinformatics experts to present perspectives from disciplines that generate, model and use fossil calibration data. Presentations and discussions focused on channels for interdisciplinary collaboration, best practices for justifying, reporting and using fossil calibrations and roadblocks to synthesis of palaeontological and molecular data. Bioinformatics solutions were proposed, with the primary objective being a new database for vetted fossil calibrations with linkages to existing resources, targeted for a 2012 launch. PMID:21525049

  3. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  4. The European Bioinformatics Institute's data resources 2014.

    PubMed

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.

  5. Decision method for optimal selection of warehouse material handling strategies by production companies

    NASA Astrophysics Data System (ADS)

    Dobos, P.; Tamás, P.; Illés, B.

    2016-11-01

    Adequate establishment and operation of warehouse logistics determines the companies’ competitiveness significantly because it effects greatly the quality and the selling price of the goods that the production companies produce. In order to implement and manage an adequate warehouse system, adequate warehouse position, stock management model, warehouse technology, motivated work force committed to process improvement and material handling strategy are necessary. In practical life, companies have paid small attantion to select the warehouse strategy properly. Although it has a major influence on the production in the case of material warehouse and on smooth costumer service in the case of finished goods warehouse because this can happen with a huge loss in material handling. Due to the dynamically changing production structure, frequent reorganization of warehouse activities is needed, on what the majority of the companies react basically with no reactions. This work presents a simulation test system frames for eligible warehouse material handling strategy selection and also the decision method for selection.

  6. R2 Water Quality Portal Monitoring Stations

    EPA Pesticide Factsheets

    The Water Quality Data Portal (WQP) provides an easy way to access data stored in various large water quality databases. The WQP provides various input parameters on the form including location, site, sampling, and date parameters to filter and customize the returned results. The The Water Quality Portal (WQP) is a cooperative service sponsored by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA) and the National Water Quality Monitoring Council (NWQMC) that integrates publicly available water quality data from the USGS National Water Information System (NWIS) the EPA STOrage and RETrieval (STORET) Data Warehouse, and the USDA ARS Sustaining The Earth??s Watersheds - Agricultural Research Database System (STEWARDS).

  7. BIRCH: a user-oriented, locally-customizable, bioinformatics system.

    PubMed

    Fristensky, Brian

    2007-02-09

    Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.

  8. BIRCH: A user-oriented, locally-customizable, bioinformatics system

    PubMed Central

    Fristensky, Brian

    2007-01-01

    Background Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere. PMID:17291351

  9. Warehouse Sanitation Workshop Handbook.

    ERIC Educational Resources Information Center

    Food and Drug Administration (DHHS/PHS), Washington, DC.

    This workshop handbook contains information and reference materials on proper food warehouse sanitation. The materials have been used at Food and Drug Administration (FDA) food warehouse sanitation workshops, and are selected by the FDA for use by food warehouse operators and for training warehouse sanitation employees. The handbook is divided…

  10. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  11. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  12. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  13. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 10 2011-01-01 2011-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  14. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  15. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  16. 19 CFR 19.1 - Classes of customs warehouses.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... customs warehouses. (a) Classifications. Customs warehouses shall be designated according to the following classifications: (1) Class 1. Premises that may be owned or leased by the Government, when the exigencies of the... class 11 warehouse, following an application under § 19.2. So far as such warehouses are used for the...

  17. 19 CFR 19.1 - Classes of customs warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... customs warehouses. (a) Classifications. Customs warehouses shall be designated according to the following classifications: (1) Class 1. Premises that may be owned or leased by the Government, when the exigencies of the... class 11 warehouse, following an application under § 19.2. So far as such warehouses are used for the...

  18. [Application of bioinformatics in researches of industrial biocatalysis].

    PubMed

    Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao

    2004-05-01

    Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.

  19. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. DEPOT: A Database of Environmental Parameters, Organizations and Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    CARSON,SUSAN D.; HUNTER,REGINA LEE; MALCZYNSKI,LEONARD A.

    2000-12-19

    The Database of Environmental Parameters, Organizations, and Tools (DEPOT) has been developed by the Department of Energy (DOE) as a central warehouse for access to data essential for environmental risk assessment analyses. Initial efforts have concentrated on groundwater and vadose zone transport data and bioaccumulation factors. DEPOT seeks to provide a source of referenced data that, wherever possible, includes the level of uncertainty associated with these parameters. Based on the amount of data available for a particular parameter, uncertainty is expressed as a standard deviation or a distribution function. DEPOT also provides DOE site-specific performance assessment data, pathway-specific transport data,more » and links to environmental regulations, disposal site waste acceptance criteria, other environmental parameter databases, and environmental risk assessment models.« less

  1. Customer and household matching: resolving entity identity in data warehouses

    NASA Astrophysics Data System (ADS)

    Berndt, Donald J.; Satterfield, Ronald K.

    2000-04-01

    The data preparation and cleansing tasks necessary to ensure high quality data are among the most difficult challenges faced in data warehousing and data mining projects. The extraction of source data, transformation into new forms, and loading into a data warehouse environment are all time consuming tasks that can be supported by methodologies and tools. This paper focuses on the problem of record linkage or entity matching, tasks that can be very important in providing high quality data. Merging two or more large databases into a single integrated system is a difficult problem in many industries, especially in the wake of acquisitions. For example, managing customer lists can be challenging when duplicate entries, data entry problems, and changing information conspire to make data quality an elusive target. Common tasks with regard to customer lists include customer matching to reduce duplicate entries and household matching to group customers. These often O(n2) problems can consume significant resources, both in computing infrastructure and human oversight, and the goal of high accuracy in the final integrated database can be difficult to assure. This paper distinguishes between attribute corruption and entity corruption, discussing the various impacts on quality. A metajoin operator is proposed and used to organize past and current entity matching techniques. Finally, a logistic regression approach to implementing the metajoin operator is discussed and illustrated with an example. The metajoin can be used to determine whether two records match, don't match, or require further evaluation by human experts. Properly implemented, the metajoin operator could allow the integration of individual databases with greater accuracy and lower cost.

  2. Integration of upper air data in the MeteoSwiss Data Warehouse

    NASA Astrophysics Data System (ADS)

    Musa, M.; Haeberli, Ch.; Ruffieux, D.

    2010-09-01

    Over the last 10 years MeteoSwiss established a Data Warehouse in order to get one single, integrated data platform for all kinds of meteorological and climatological data. In the MeteoSwiss Data Warehouse data and metadata are hold in a metadata driven relational database. To reach this goal, we started with the integration of the actual and historical data from our surface stations in a first step, including routines for aggregation and calculation and the implementation of enhanced Quality Control tools. In 2008 we started with the integration of actual and historical upper air data like soundings (PTU, Wind and Ozone), any kind of profilers like wind profiler or radiometer, profiles calculated from numerical weather models and AMDAR data in the Data Warehouse. The dataset includes also high resolution sounding data from the station Payerne and TEMP data from 20 European stations since 1942. A critical point was to work out a concept for the general architecture which could deal with all different types of data. While integrating the data itself all metadata of the aerological station Payerne was transferred and imported in the central metadata repository of the Data Warehouse. The implementation of the real time and daily QC tools as well as the routines for aggregation and calculation were realized in an analog way as for the surface data. The Quality Control tools include plausibility tests like limit tests, consistency tests in the same level and vertical consistency tests. From the beginning it was the aim to support the MeteoSwiss integration strategy which deals with all aspects of integration like various observing technologies and platforms, observing systems outside MeteoSwiss and the data and metadata itself. This kind of integration comprises all aspects of "Enterprise Data Integration". After the integration, the historical as well as the actual upper air data are now available for the climatologists and meteorologists with standardized access for data retrieving and visualization. We are convinced making these data accessible for the scientist is a good contribution to a better understanding of high resolution climatology.

  3. Revealing biological information using data structuring and automated learning.

    PubMed

    Mohorianu, Irina; Moulton, Vincent

    2010-11-01

    The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.

  4. Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

    NASA Astrophysics Data System (ADS)

    Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

    2017-01-01

    Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.

  5. Anchor Modeling

    NASA Astrophysics Data System (ADS)

    Regardt, Olle; Rönnbäck, Lars; Bergholtz, Maria; Johannesson, Paul; Wohed, Petia

    Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The main reason for this state of affairs is that the environment of a data warehouse is in constant change, while the warehouse itself needs to provide a stable and consistent interface to information spanning extended periods of time. In this paper, we propose a modeling technique for data warehousing, called anchor modeling, that offers non-destructive extensibility mechanisms, thereby enabling robust and flexible management of changes in source systems. A key benefit of anchor modeling is that changes in a data warehouse environment only require extensions, not modifications, to the data warehouse. This ensures that existing data warehouse applications will remain unaffected by the evolution of the data warehouse, i.e. existing views and functions will not have to be modified as a result of changes in the warehouse model.

  6. The European Bioinformatics Institute’s data resources 2014

    PubMed Central

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the ‘big data’ revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff’s ‘Atlas of Protein Sequence and Structure’ through the Human Genome Project in the late 1990s and early 2000s to today’s population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI’s database collection to complement the reviews of individual databases provided elsewhere in this issue. PMID:24271396

  7. Access to DNA and protein databases on the Internet.

    PubMed

    Harper, R

    1994-02-01

    During the past year, the number of biological databases that can be queried via Internet has dramatically increased. This increase has resulted from the introduction of networking tools, such as Gopher and WAIS, that make it easy for research workers to index databases and make them available for on-line browsing. Biocomputing in the nineties will see the advent of more client/server options for the solution of problems in bioinformatics.

  8. PHASE I MATERIALS PROPERTY DATABASE DEVELOPMENT FOR ASME CODES AND STANDARDS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Weiju; Lin, Lianshan

    2013-01-01

    To support the ASME Boiler and Pressure Vessel Codes and Standard (BPVC) in modern information era, development of a web-based materials property database is initiated under the supervision of ASME Committee on Materials. To achieve efficiency, the project heavily draws upon experience from development of the Gen IV Materials Handbook and the Nuclear System Materials Handbook. The effort is divided into two phases. Phase I is planned to deliver a materials data file warehouse that offers a depository for various files containing raw data and background information, and Phase II will provide a relational digital database that provides advanced featuresmore » facilitating digital data processing and management. Population of the database will start with materials property data for nuclear applications and expand to data covering the entire ASME Code and Standards including the piping codes as the database structure is continuously optimized. The ultimate goal of the effort is to establish a sound cyber infrastructure that support ASME Codes and Standards development and maintenance.« less

  9. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  10. ENFIN a network to enhance integrative systems biology.

    PubMed

    Kahlem, Pascal; Birney, Ewan

    2007-12-01

    Integration of biological data of various types and development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing both an adapted infrastructure to connect databases and platforms to enable the generation of new bioinformatics tools as well as the experimental validation of computational predictions. We will give an overview of the projects tackled within ENFIN and discuss the challenges associated with integration for systems biology.

  11. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.

  12. Managing dual warehouses with an incentive policy for deteriorating items

    NASA Astrophysics Data System (ADS)

    Yu, Jonas C. P.; Wang, Kung-Jeng; Lin, Yu-Siang

    2016-02-01

    Distributors in a supply chain usually limit their own warehouse in finite capacity for cost reduction and excess stock is held in a rent warehouse. In this study, we examine inventory control for deteriorating items in a two-warehouse setting. Assuming that there is an incentive offered by a rent warehouse that allows the rental fee to decrease over time, the objective of this study is to maximise the joint profit of the manufacturer and the distributor. An optimisation procedure is developed to derive the optimal joint economic lot size policy. Several criteria are identified to select the most appropriate warehouse configuration and inventory policy on the basis of storage duration of materials in a rent warehouse. Sensitivity analysis is done to examine the results of model robustness. The proposed model enables a manufacturer with a channel distributor to coordinate the use of alternative warehouses, and to maximise the joint profit of the manufacturer and the distributor.

  13. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection

    PubMed Central

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike

    2018-01-01

    ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396

  14. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    PubMed

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.

  15. NMSBA ? RS21.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kinnan, Mark K.; Valerio, Richard Arthur; Flanagan, Tatiana Paz

    2016-12-01

    This report gives introductory guidance on the level of effort required to create a data warehouse for mining data. Numerous tutorials have been provided to demonstrate the process of downloading raw data, processing the raw data, and importing the data into a PostgreSQL database. Additional information and tutorial has been provided on setting up a Hadoop cluster for storing vasts amounts of data. This report has been generated as a deliverable for a New Mexico Small Business Assistance (NMSBA) project.

  16. FlyMine: an integrated database for Drosophila and Anopheles genomics

    PubMed Central

    Lyne, Rachel; Smith, Richard; Rutherford, Kim; Wakeling, Matthew; Varley, Andrew; Guillier, Francois; Janssens, Hilde; Ji, Wenyan; Mclaren, Peter; North, Philip; Rana, Debashis; Riley, Tom; Sullivan, Julie; Watkins, Xavier; Woodbridge, Mark; Lilley, Kathryn; Russell, Steve; Ashburner, Michael; Mizuguchi, Kenji; Micklem, Gos

    2007-01-01

    FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists. PMID:17615057

  17. Student Research Projects

    NASA Technical Reports Server (NTRS)

    Yeske, Lanny A.

    1998-01-01

    Numerous FY1998 student research projects were sponsored by the Mississippi State University Center for Air Sea Technology. This technical note describes these projects which include research on: (1) Graphical User Interfaces, (2) Master Environmental Library, (3) Database Management Systems, (4) Naval Interactive Data Analysis System, (5) Relocatable Modeling Environment, (6) Tidal Models, (7) Book Inventories, (8) System Analysis, (9) World Wide Web Development, (10) Virtual Data Warehouse, (11) Enterprise Information Explorer, (12) Equipment Inventories, (13) COADS, and (14) JavaScript Technology.

  18. 7 CFR 735.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General Provisions § 735.3..., change, and transfer warehouse receipts or other applicable document information retained in a central... provider, as a disinterested third party, authorized by DACO where information relating to warehouse...

  19. 7 CFR 735.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General Provisions § 735.3..., change, and transfer warehouse receipts or other applicable document information retained in a central... provider, as a disinterested third party, authorized by DACO where information relating to warehouse...

  20. yStreX: yeast stress expression database

    PubMed Central

    Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina

    2014-01-01

    Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351

  1. YPED: An Integrated Bioinformatics Suite and Database for Mass Spectrometry-based Proteomics Research

    PubMed Central

    Colangelo, Christopher M.; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L.; Carriero, Nicholas J.; Gulcicek, Erol E.; Lam, TuKiet T.; Wu, Terence; Bjornson, Robert D.; Bruce, Can; Nairn, Angus C.; Rinehart, Jesse; Miller, Perry L.; Williams, Kenneth R.

    2015-01-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry (LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED’s database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. PMID:25712262

  2. YPED: an integrated bioinformatics suite and database for mass spectrometry-based proteomics research.

    PubMed

    Colangelo, Christopher M; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L; Carriero, Nicholas J; Gulcicek, Erol E; Lam, TuKiet T; Wu, Terence; Bjornson, Robert D; Bruce, Can; Nairn, Angus C; Rinehart, Jesse; Miller, Perry L; Williams, Kenneth R

    2015-02-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  3. Ambiguity and variability of database and software names in bioinformatics.

    PubMed

    Duck, Geraint; Kovacevic, Aleksandar; Robertson, David L; Stevens, Robert; Nenadic, Goran

    2015-01-01

    There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.

  4. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    PubMed

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  5. Data warehouse governance programs in healthcare settings: a literature review and a call to action.

    PubMed

    Elliott, Thomas E; Holmes, John H; Davidson, Arthur J; La Chance, Pierre-Andre; Nelson, Andrew F; Steiner, John F

    2013-01-01

    Given the extensive data stored in healthcare data warehouses, data warehouse governance policies are needed to ensure data integrity and privacy. This review examines the current state of the data warehouse governance literature as it applies to healthcare data warehouses, identifies knowledge gaps, provides recommendations, and suggests approaches for further research. A comprehensive literature search using five data bases, journal article title-search, and citation searches was conducted between 1997 and 2012. Data warehouse governance documents from two healthcare systems in the USA were also reviewed. A modified version of nine components from the Data Governance Institute Framework for data warehouse governance guided the qualitative analysis. Fifteen articles were retrieved. Only three were related to healthcare settings, each of which addressed only one of the nine framework components. Of the remaining 12 articles, 10 addressed between one and seven framework components and the remainder addressed none. Each of the two data warehouse governance plans obtained from healthcare systems in the USA addressed a subset of the framework components, and between them they covered all nine. While published data warehouse governance policies are rare, the 15 articles and two healthcare organizational documents reviewed in this study may provide guidance to creating such policies. Additional research is needed in this area to ensure that data warehouse governance polices are feasible and effective. The gap between the development of data warehouses in healthcare settings and formal governance policies is substantial, as evidenced by the sparse literature in this domain.

  6. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  7. Influenza Research Database: An integrated bioinformatics resource for influenza virus research.

    PubMed

    Zhang, Yun; Aevermann, Brian D; Anderson, Tavis K; Burke, David F; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N; Lee, Alexandra J; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B; Scheuermann, Richard H

    2017-01-04

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. CellLineNavigator: a workbench for cancer cell line analysis

    PubMed Central

    Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

    2013-01-01

    The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487

  9. High-throughput protein analysis integrating bioinformatics and experimental assays

    PubMed Central

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  10. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface.

    PubMed

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-08-25

    Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms.

  11. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface

    PubMed Central

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-01-01

    Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156

  12. 27 CFR 24.108 - Bonded wine warehouse application.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonded wine warehouse... BUREAU, DEPARTMENT OF THE TREASURY LIQUORS WINE Establishment and Operations Application § 24.108 Bonded wine warehouse application. A warehouse company or other person desiring to establish a bonded wine...

  13. 78 FR 77662 - Notice of Availability (NOA) for General Purpose Warehouse and Information Technology Center...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-24

    ... (NOA) for General Purpose Warehouse and Information Technology Center Construction (GPW/IT)--Tracy Site.... ACTION: Notice of Availability (NOA) for General Purpose Warehouse and Information Technology Center... FR 65300) announcing the publication of the General Purpose Warehouse and Information Technology...

  14. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 7 2013-01-01 2013-01-01 false Electronic warehouse receipts. 735.303 Section 735.303 Agriculture Regulations of the Department of Agriculture (Continued) FARM SERVICE AGENCY, DEPARTMENT OF... § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  15. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 7 2014-01-01 2014-01-01 false Electronic warehouse receipts. 735.303 Section 735.303 Agriculture Regulations of the Department of Agriculture (Continued) FARM SERVICE AGENCY, DEPARTMENT OF... § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  16. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 7 2012-01-01 2012-01-01 false Electronic warehouse receipts. 735.303 Section 735.303 Agriculture Regulations of the Department of Agriculture (Continued) FARM SERVICE AGENCY, DEPARTMENT OF... § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  17. 7 CFR 735.6 - Suspension, revocation and liquidation.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... control and begin an orderly liquidation of such warehouse inventory or provider system data as provided..., DEPARTMENT OF AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General... licensing or provider agreement. (4) Failure to maintain control of the warehouse or provider system. (5...

  18. 7 CFR 735.6 - Suspension, revocation and liquidation.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... control and begin an orderly liquidation of such warehouse inventory or provider system data as provided..., DEPARTMENT OF AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General... licensing or provider agreement. (4) Failure to maintain control of the warehouse or provider system. (5...

  19. 27 CFR 41.11 - Meaning of terms.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... Reserve Bank of New York. Export warehouse. A bonded internal revenue warehouse for the storage of tobacco... revenue laws of the United States. Export warehouse proprietor. Any person who operates an export warehouse. Factory. The premises of a manufacturer of tobacco products, cigarette papers or tubes, or...

  20. 7 CFR 1427.11 - Warehouse receipts.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... (Signature or initials), Date. (3) Alterations in other inserted data on a machine card type warehouse... 7 Agriculture 10 2010-01-01 2010-01-01 false Warehouse receipts. 1427.11 Section 1427.11... Deficiency Payments § 1427.11 Warehouse receipts. (a) Producers may obtain loans on eligible cotton...

  1. 7 CFR 1427.11 - Warehouse receipts.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... (Signature or initials), Date. (3) Alterations in other inserted data on a machine card type warehouse... 7 Agriculture 10 2011-01-01 2011-01-01 false Warehouse receipts. 1427.11 Section 1427.11... Deficiency Payments § 1427.11 Warehouse receipts. (a) Producers may obtain loans on eligible cotton...

  2. Warehouse multipoint temperature and humidity monitoring system design based on Kingview

    NASA Astrophysics Data System (ADS)

    Ou, Yanghui; Wang, Xifu; Liu, Jingyun

    2017-04-01

    Storage is the key link of modern logistics. Warehouse environment monitoring is an important part of storage safety management. To meet the storage requirements of different materials, guarantee their quality in the greatest extent, which has great significance. In the warehouse environment monitoring, the most important parameters are air temperature and relative humidity. In this paper, a design of warehouse multipoint temperature and humidity monitoring system based on King view, which realizes the multipoint temperature and humidity data real-time acquisition, monitoring and storage in warehouse by using temperature and humidity sensor. Also, this paper will take the bulk grain warehouse as an example and based on the data collected in real-time monitoring, giving the corresponding expert advice that combined with the corresponding algorithm, providing theoretical guidance to control the temperature and humidity in grain warehouse.

  3. A geodata warehouse: Using denormalisation techniques as a tool for delivering spatially enabled integrated geological information to geologists

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2016-11-01

    New requirements to understand geological properties in three dimensions have led to the development of PropBase, a data structure and delivery tools to deliver this. At the BGS, relational database management systems (RDBMS) has facilitated effective data management using normalised subject-based database designs with business rules in a centralised, vocabulary controlled, architecture. These have delivered effective data storage in a secure environment. However, isolated subject-oriented designs prevented efficient cross-domain querying of datasets. Additionally, the tools provided often did not enable effective data discovery as they struggled to resolve the complex underlying normalised structures providing poor data access speeds. Users developed bespoke access tools to structures they did not fully understand sometimes delivering them incorrect results. Therefore, BGS has developed PropBase, a generic denormalised data structure within an RDBMS to store property data, to facilitate rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, with associated metadata. This includes scripts to populate and synchronise the layer with its data sources through structured input and transcription standards. A core component of the architecture includes, an optimised query object, to deliver geoscience information from a structure equivalent to a data warehouse. This enables optimised query performance to deliver data in multiple standardised formats using a web discovery tool. Semantic interoperability is enforced through vocabularies combined from all data sources facilitating searching of related terms. PropBase holds 28.1 million spatially enabled property data points from 10 source databases incorporating over 50 property data types with a vocabulary set that includes 557 property terms. By enabling property data searches across multiple databases PropBase has facilitated new scientific research, previously considered impractical. PropBase is easily extended to incorporate 4D data (time series) and is providing a baseline for new "big data" monitoring projects.

  4. Data Warehouse Governance Programs in Healthcare Settings: A Literature Review and a Call to Action

    PubMed Central

    Elliott, Thomas E.; Holmes, John H.; Davidson, Arthur J.; La Chance, Pierre-Andre; Nelson, Andrew F.; Steiner, John F.

    2013-01-01

    Purpose: Given the extensive data stored in healthcare data warehouses, data warehouse governance policies are needed to ensure data integrity and privacy. This review examines the current state of the data warehouse governance literature as it applies to healthcare data warehouses, identifies knowledge gaps, provides recommendations, and suggests approaches for further research. Methods: A comprehensive literature search using five data bases, journal article title-search, and citation searches was conducted between 1997 and 2012. Data warehouse governance documents from two healthcare systems in the USA were also reviewed. A modified version of nine components from the Data Governance Institute Framework for data warehouse governance guided the qualitative analysis. Results: Fifteen articles were retrieved. Only three were related to healthcare settings, each of which addressed only one of the nine framework components. Of the remaining 12 articles, 10 addressed between one and seven framework components and the remainder addressed none. Each of the two data warehouse governance plans obtained from healthcare systems in the USA addressed a subset of the framework components, and between them they covered all nine. Conclusions: While published data warehouse governance policies are rare, the 15 articles and two healthcare organizational documents reviewed in this study may provide guidance to creating such policies. Additional research is needed in this area to ensure that data warehouse governance polices are feasible and effective. The gap between the development of data warehouses in healthcare settings and formal governance policies is substantial, as evidenced by the sparse literature in this domain. PMID:25848561

  5. 27 CFR 24.141 - Bonded wine warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonded wine warehouse. 24..., DEPARTMENT OF THE TREASURY LIQUORS WINE Establishment and Operations Permanent Discontinuance of Operations § 24.141 Bonded wine warehouse. Where all operations at a bonded wine warehouse are to be permanently...

  6. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 7 2013-01-01 2013-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  7. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 7 2014-01-01 2014-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  8. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  9. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 7 2012-01-01 2012-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  10. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  11. 7 CFR 735.302 - Paper warehouse receipts.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Paper warehouse receipts. 735.302 Section 735.302... § 735.302 Paper warehouse receipts. Paper warehouse receipts must be issued as follows: (a) On distinctive paper specified by DACO; (b) Printed by a printer authorized by DACO; and (c) Issued, identified...

  12. 7 CFR 735.302 - Paper warehouse receipts.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Paper warehouse receipts. 735.302 Section 735.302... § 735.302 Paper warehouse receipts. Paper warehouse receipts must be issued as follows: (a) On distinctive paper specified by DACO; (b) Printed by a printer authorized by DACO; and (c) Issued, identified...

  13. 19 CFR 144.27 - Withdrawal from warehouse by transferee.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ...; DEPARTMENT OF THE TREASURY (CONTINUED) WAREHOUSE AND REWAREHOUSE ENTRIES AND WITHDRAWALS Transfer of Right To Withdraw Merchandise from Warehouse § 144.27 Withdrawal from warehouse by transferee. At any time within... withdraw all or part of the merchandise covered by the transfer by filing any authorized kind of withdrawal...

  14. 19 CFR 144.27 - Withdrawal from warehouse by transferee.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...; DEPARTMENT OF THE TREASURY (CONTINUED) WAREHOUSE AND REWAREHOUSE ENTRIES AND WITHDRAWALS Transfer of Right To Withdraw Merchandise from Warehouse § 144.27 Withdrawal from warehouse by transferee. At any time within... withdraw all or part of the merchandise covered by the transfer by filing any authorized kind of withdrawal...

  15. What Academia Can Gain from Building a Data Warehouse.

    ERIC Educational Resources Information Center

    Wierschem, David; McMillen, Jeremy; McBroom, Randy

    2003-01-01

    Describes how, when used effectively, data warehouses can be a significant component of strategic decision making on campus. Discusses what a data warehouse is and what its informational contents may include, environmental drivers and obstacles, and strategies to justify developing a data warehouse for an academic institution. (EV)

  16. Lipidomics informatics for life-science.

    PubMed

    Schwudke, D; Shevchenko, A; Hoffmann, N; Ahrends, R

    2017-11-10

    Lipidomics encompasses analytical approaches that aim to identify and quantify the complete set of lipids, defined as lipidome in a given cell, tissue or organism as well as their interactions with other molecules. The majority of lipidomics workflows is based on mass spectrometry and has been proven as a powerful tool in system biology in concert with other Omics disciplines. Unfortunately, bioinformatics infrastructures for this relatively young discipline are limited only to some specialists. Search engines, quantification algorithms, visualization tools and databases developed by the 'Lipidomics Informatics for Life-Science' (LIFS) partners will be restructured and standardized to provide broad access to these specialized bioinformatics pipelines. There are many medical challenges related to lipid metabolic alterations that will be fostered by capacity building suggested by LIFS. LIFS as member of the 'German Network for Bioinformatics' (de.NBI) node for 'Bioinformatics for Proteomics' (BioInfra.Prot) and will provide access to the described software as well as to tutorials and consulting services via a unified web-portal. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Design of an autonomous exterior security robot

    NASA Technical Reports Server (NTRS)

    Myers, Scott D.

    1994-01-01

    This paper discusses the requirements and preliminary design of robotic vehicle designed for performing autonomous exterior perimeter security patrols around warehouse areas, ammunition supply depots, and industrial parks for the U.S. Department of Defense. The preliminary design allows for the operation of up to eight vehicles in a six kilometer by six kilometer zone with autonomous navigation and obstacle avoidance. In addition to detection of crawling intruders at 100 meters, the system must perform real-time inventory checking and database comparisons using a microwave tags system.

  18. The Analytic Information Warehouse (AIW): a Platform for Analytics using Electronic Health Record Data

    PubMed Central

    Post, Andrew R.; Kurc, Tahsin; Cholleti, Sharath; Gao, Jingjing; Lin, Xia; Bornstein, William; Cantrell, Dedra; Levine, David; Hohmann, Sam; Saltz, Joel H.

    2013-01-01

    Objective To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations. Materials and Methods We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transforming data represented in different physical schemas into a common data model, specifying derived variables in terms of the common model to enable their reuse, computing derived variables while enforcing invariants and ensuring correctness and consistency of data transformations, long-term curation of derived data, and export of derived data into standard analysis tools. It includes software that implements these features and a computing environment that enables secure high-performance access to and processing of large datasets extracted from EHRs. Results We have implemented and deployed the architecture in production locally. The software is available as open source. We have used it as part of hospital operations in a project to reduce rates of hospital readmission within 30 days. The project examined the association of over 100 derived variables representing disease and co-morbidity phenotypes with readmissions in five years of data from our institution’s clinical data warehouse and the UHC Clinical Database (CDB). The CDB contains administrative data from over 200 hospitals that are in academic medical centers or affiliated with such centers. Discussion and Conclusion A widely available platform for managing and detecting phenotypes in EHR data could accelerate the use of such data in quality improvement and comparative effectiveness studies. PMID:23402960

  19. An online analytical processing multi-dimensional data warehouse for malaria data

    PubMed Central

    Madey, Gregory R; Vyushkov, Alexander; Raybaud, Benoit; Burkot, Thomas R; Collins, Frank H

    2017-01-01

    Abstract Malaria is a vector-borne disease that contributes substantially to the global burden of morbidity and mortality. The management of malaria-related data from heterogeneous, autonomous, and distributed data sources poses unique challenges and requirements. Although online data storage systems exist that address specific malaria-related issues, a globally integrated online resource to address different aspects of the disease does not exist. In this article, we describe the design, implementation, and applications of a multi-dimensional, online analytical processing data warehouse, named the VecNet Data Warehouse (VecNet-DW). It is the first online, globally-integrated platform that provides efficient search, retrieval and visualization of historical, predictive, and static malaria-related data, organized in data marts. Historical and static data are modelled using star schemas, while predictive data are modelled using a snowflake schema. The major goals, characteristics, and components of the DW are described along with its data taxonomy and ontology, the external data storage systems and the logical modelling and physical design phases. Results are presented as screenshots of a Dimensional Data browser, a Lookup Tables browser, and a Results Viewer interface. The power of the DW emerges from integrated querying of the different data marts and structuring those queries to the desired dimensions, enabling users to search, view, analyse, and store large volumes of aggregated data, and responding better to the increasing demands of users. Database URL https://dw.vecnet.org/datawarehouse/ PMID:29220463

  20. Hubble Space Telescope: the new telemetry archiving system

    NASA Astrophysics Data System (ADS)

    Miebach, Manfred P.

    2000-07-01

    The Hubble Space Telescope (HST), the first of NASA's Great Observatories, was launched on April 24, 1990. The HST was designed for a minimum fifteen-year mission with on-orbit servicing by the Space Shuttle System planned at approximately three-year intervals. Major changes to the HST ground system have been implemented for the third servicing mission in December 1999. The primary objectives of the ground system re- engineering effort, a project called 'Vision 2000 Control Center System (CCS),' are to reduce both development and operating costs significantly for the remaining years of HST's lifetime. Development costs are reduced by providing a more modern hardware and software architecture and utilizing commercial off the shelf (COTS) products wherever possible. Part of CCS is a Space Telescope Engineering Data Store, the design of which is based on current Data Warehouse technology. The Data Warehouse (Red Brick), as implemented in the CCS Ground System that operates and monitors the Hubble Space Telescope, represents the first use of a commercial Data Warehouse to manage engineering data. The purpose of this data store is to provide a common data source of telemetry data for all HST subsystems. This data store will become the engineering data archive and will provide a queryable database for the user to analyze HST telemetry. The access to the engineering data in the Data Warehouse is platform-independent from an office environment using commercial standards (Unix, Windows98/NT). The latest Internet technology is used to reach the HST engineering community. A WEB-based user interface allows easy access to the data archives. This paper will provide a CCS system overview and will illustrate some of the CCS telemetry capabilities: in particular the use of the new Telemetry Archiving System. Vision 20001 is an ambitious project, but one that is well under way. It will allow the HST program to realize reduced operations costs for the Third Servicing Mission and beyond.

  1. Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses

    PubMed Central

    Myers, Risa B.; Herskovic, Jorge R.

    2011-01-01

    Proposal and execution of clinical trials, computation of quality measures and discovery of correlation between medical phenomena are all applications where an accurate count of patients is needed. However, existing sources of this type of patient information, including Clinical Data Warehouses (CDW) may be incomplete or inaccurate. This research explores applying probabilistic techniques, supported by the MayBMS probabilistic database, to obtain accurate patient counts from a clinical data warehouse containing synthetic patient data. We present a synthetic clinical data warehouse (CDW), and populate it with simulated data using a custom patient data generation engine. We then implement, evaluate and compare different techniques for obtaining patients counts. We model billing as a test for the presence of a condition. We compute billing’s sensitivity and specificity both by conducting a “Simulated Expert Review” where a representative sample of records are reviewed and labeled by experts, and by obtaining the ground truth for every record. We compute the posterior probability of a patient having a condition through a “Bayesian Chain”, using Bayes’ Theorem to calculate the probability of a patient having a condition after each visit. The second method is a “one-shot” approach that computes the probability of a patient having a condition based on whether the patient is ever billed for the condition Our results demonstrate the utility of probabilistic approaches, which improve on the accuracy of raw counts. In particular, the simulated review paired with a single application of Bayes’ Theorem produces the best results, with an average error rate of 2.1% compared to 43.7% for the straightforward billing counts. Overall, this research demonstrates that Bayesian probabilistic approaches improve patient counts on simulated patient populations. We believe that total patient counts based on billing data are one of the many possible applications of our Bayesian framework. Use of these probabilistic techniques will enable more accurate patient counts and better results for applications requiring this metric. PMID:21986292

  2. Using EMBL-EBI services via Web interface and programmatically via Web Services

    PubMed Central

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2015-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. PMID:25501941

  3. Biopython: freely available Python tools for computational molecular biology and bioinformatics.

    PubMed

    Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

    2009-06-01

    The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.

  4. KMC 3: counting and manipulating k-mer statistics.

    PubMed

    Kokot, Marek; Dlugosz, Maciej; Deorowicz, Sebastian

    2017-09-01

    Counting all k -mers in a given dataset is a standard procedure in many bioinformatics applications. We introduce KMC3, a significant improvement of the former KMC2 algorithm together with KMC tools for manipulating k -mer databases. Usefulness of the tools is shown on a few real problems. Program is freely available at http://sun.aei.polsl.pl/REFRESH/kmc . sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. PatGen--a consolidated resource for searching genetic patent sequences.

    PubMed

    Rouse, Richard J D; Castagnetto, Jesus; Niedner, Roland H

    2005-04-15

    Compared to the wealth of online resources covering genomic, proteomic and derived data the Bioinformatics community is rather underserved when it comes to patent information related to biological sequences. The current online resources are either incomplete or rather expensive. This paper describes, PatGen, an integrated database containing data from bioinformatic and patent resources. This effort addresses the inconsistency of publicly available genetic patent data coverage by providing access to a consolidated dataset. PatGen can be searched at http://www.patgendb.com rjdrouse@patentinformatics.com.

  6. 27 CFR 28.28 - Withdrawal of wine and distilled spirits from customs bonded warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Withdrawal of wine and... Miscellaneous Provisions Customs Bonded Warehouses § 28.28 Withdrawal of wine and distilled spirits from customs bonded warehouses. Wine and bottled distilled spirits entered into customs bonded warehouses as provided...

  7. ADM. Warehouse (TAN604) Floor plan. General warehouse and chemical storage. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    ADM. Warehouse (TAN-604) Floor plan. General warehouse and chemical storage. Ralph M. Parsons 902-2-ANP-604-A 55. Date: December 1952. Approved by INEEL Classification Office for public release. INEEL index code no. 035-0604-00-693-106727 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID

  8. 7 CFR 735.106 - Excess storage and transferring of agricultural products.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Excess storage and transferring of agricultural... WAREHOUSE ACT Warehouse Licensing § 735.106 Excess storage and transferring of agricultural products. (a) If at any time a warehouse operator stores an agricultural product in a warehouse subject to a license...

  9. 7 CFR 735.106 - Excess storage and transferring of agricultural products.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Excess storage and transferring of agricultural... WAREHOUSE ACT Warehouse Licensing § 735.106 Excess storage and transferring of agricultural products. (a) If at any time a warehouse operator stores an agricultural product in a warehouse subject to a license...

  10. The GMOD Drupal bioinformatic server framework.

    PubMed

    Papanicolaou, Alexie; Heckel, David G

    2010-12-15

    Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com.

  11. Creating databases for biological information: an introduction.

    PubMed

    Stein, Lincoln

    2002-08-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.

  12. Data warehouse implementation with clinical pharmacokinetic/pharmacodynamic data.

    PubMed

    Koprowski, S P; Barrett, J S

    2002-03-01

    We have created a data warehouse for human pharmacokinetic (PK) and pharmacodynamic (PD) data generated primarily within the Clinical PK Group of the Drug Metabolism and Pharmacokinetics (DM&PK) Department of DuPont Pharmaceuticals. Data which enters an Oracle-based LIMS directly from chromatography systems or through files from contract research organizations are accessed via SAS/PH.Kinetics, GLP-compliant data analysis software residing on individual users' workstations. Upon completion of the final PK or PD analysis, data are pushed to a predefined location. Data analyzed/created with other software (i.e., WinNonlin, NONMEM, Adapt, etc.) are added to this file repository as well. The warehouse creates views to these data and accumulates metadata on all data sources defined in the warehouse. The warehouse is managed via the SAS/Warehouse Administrator product that defines the environment, creates summarized data structures, and schedules data refresh. The clinical PK/PD warehouse encompasses laboratory, biometric, PK and PD data streams. Detailed logical tables for each compound are created/updated as the clinical PK/PD data warehouse is populated. The data model defined to the warehouse is based on a star schema. Summarized data structures such as multidimensional data bases (MDDB), infomarts, and datamarts are created from detail tables. Data mining and querying of highly summarized data as well as drill-down to detail data is possible via the creation of exploitation tools which front-end the warehouse data. Based on periodic refreshing of the warehouse data, these applications are able to access the most current data available and do not require a manual interface to update/populate the data store. Prototype applications have been web-enabled to facilitate their usage to varied data customers across platform and location. The warehouse also contains automated mechanisms for the construction of study data listings and SAS transport files for eventual incorporation into an electronic submission. This environment permits the management of online analytical processing via a single administrator once the data model and warehouse configuration have been designed. The expansion of the current environment will eventually connect data from all phases of research and development ensuring the return on investment and hopefully efficiencies in data processing unforeseen with earlier legacy systems.

  13. Opportunities at the Intersection of Bioinformatics and Health Informatics

    PubMed Central

    Miller, Perry L.

    2000-01-01

    This paper provides a “viewpoint discussion” based on a presentation made to the 2000 Symposium of the American College of Medical Informatics. It discusses potential opportunities for researchers in health informatics to become involved in the rapidly growing field of bioinformatics, using the activities of the Yale Center for Medical Informatics as a case study. One set of opportunities occurs where bioinformatics research itself intersects with the clinical world. Examples include the correlations between individual genetic variation with clinical risk factors, disease presentation, and differential response to treatment; and the implications of including genetic test results in the patient record, which raises clinical decision support issues as well as legal and ethical issues. A second set of opportunities occurs where bioinformatics research can benefit from the technologic expertise and approaches that informaticians have used extensively in the clinical arena. Examples include database organization and knowledge representation, data mining, and modeling and simulation. Microarray technology is discussed as a specific potential area for collaboration. Related questions concern how best to establish collaborations with bioscientists so that the interests and needs of both sets of researchers can be met in a synergistic fashion, and the most appropriate home for bioinformatics in an academic medical center. PMID:10984461

  14. Nanoinformatics: an emerging area of information technology at the intersection of bioinformatics, computational chemistry and nanobiotechnology.

    PubMed

    González-Nilo, Fernando; Pérez-Acle, Tomás; Guínez-Molinos, Sergio; Geraldo, Daniela A; Sandoval, Claudia; Yévenes, Alejandro; Santos, Leonardo S; Laurie, V Felipe; Mendoza, Hegaly; Cachau, Raúl E

    2011-01-01

    After the progress made during the genomics era, bioinformatics was tasked with supporting the flow of information generated by nanobiotechnology efforts. This challenge requires adapting classical bioinformatic and computational chemistry tools to store, standardize, analyze, and visualize nanobiotechnological information. Thus, old and new bioinformatic and computational chemistry tools have been merged into a new sub-discipline: nanoinformatics. This review takes a second look at the development of this new and exciting area as seen from the perspective of the evolution of nanobiotechnology applied to the life sciences. The knowledge obtained at the nano-scale level implies answers to new questions and the development of new concepts in different fields. The rapid convergence of technologies around nanobiotechnologies has spun off collaborative networks and web platforms created for sharing and discussing the knowledge generated in nanobiotechnology. The implementation of new database schemes suitable for storage, processing and integrating physical, chemical, and biological properties of nanoparticles will be a key element in achieving the promises in this convergent field. In this work, we will review some applications of nanobiotechnology to life sciences in generating new requirements for diverse scientific fields, such as bioinformatics and computational chemistry.

  15. Determining conserved metabolic biomarkers from a million database queries.

    PubMed

    Kurczy, Michael E; Ivanisevic, Julijana; Johnson, Caroline H; Uritboonthai, Winnie; Hoang, Linh; Fang, Mingliang; Hicks, Matthew; Aldebot, Anthony; Rinehart, Duane; Mellander, Lisa J; Tautenhahn, Ralf; Patti, Gary J; Spilker, Mary E; Benton, H Paul; Siuzdak, Gary

    2015-12-01

    Metabolite databases provide a unique window into metabolome research allowing the most commonly searched biomarkers to be catalogued. Omic scale metabolite profiling, or metabolomics, is finding increased utility in biomarker discovery largely driven by improvements in analytical technologies and the concurrent developments in bioinformatics. However, the successful translation of biomarkers into clinical or biologically relevant indicators is limited. With the aim of improving the discovery of translatable metabolite biomarkers, we present search analytics for over one million METLIN metabolite database queries. The most common metabolites found in METLIN were cross-correlated against XCMS Online, the widely used cloud-based data processing and pathway analysis platform. Analysis of the METLIN and XCMS common metabolite data has two primary implications: these metabolites, might indicate a conserved metabolic response to stressors and, this data may be used to gauge the relative uniqueness of potential biomarkers. METLIN can be accessed by logging on to: https://metlin.scripps.edu siuzdak@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. The European Bioinformatics Institute's data resources: towards systems biology.

    PubMed

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2005-01-01

    Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein-protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.

  17. The European Bioinformatics Institute's data resources: towards systems biology

    PubMed Central

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2005-01-01

    Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein–protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk. PMID:15608238

  18. Mi-DISCOVERER: A bioinformatics tool for the detection of mi-RNA in human genome.

    PubMed

    Arshad, Saadia; Mumtaz, Asia; Ahmad, Freed; Liaquat, Sadia; Nadeem, Shahid; Mehboob, Shahid; Afzal, Muhammad

    2010-11-27

    MicroRNAs (miRNAs) are 22 nucleotides non-coding RNAs that play pivotal regulatory roles in diverse organisms including the humans and are difficult to be identified due to lack of either sequence features or robust algorithms to efficiently identify. Therefore, we made a tool that is Mi-Discoverer for the detection of miRNAs in human genome. The tools used for the development of software are Microsoft Office Access 2003, the JDK version 1.6.0, BioJava version 1.0, and the NetBeans IDE version 6.0. All already made miRNAs softwares were web based; so the advantage of our project was to make a desktop facility to the user for sequence alignment search with already identified miRNAs of human genome present in the database. The user can also insert and update the newly discovered human miRNA in the database. Mi-Discoverer, a bioinformatics tool successfully identifies human miRNAs based on multiple sequence alignment searches. It's a non redundant database containing a large collection of publicly available human miRNAs.

  19. Mi-DISCOVERER: A bioinformatics tool for the detection of mi-RNA in human genome

    PubMed Central

    Arshad, Saadia; Mumtaz, Asia; Ahmad, Freed; Liaquat, Sadia; Nadeem, Shahid; Mehboob, Shahid; Afzal, Muhammad

    2010-01-01

    MicroRNAs (miRNAs) are 22 nucleotides non-coding RNAs that play pivotal regulatory roles in diverse organisms including the humans and are difficult to be identified due to lack of either sequence features or robust algorithms to efficiently identify. Therefore, we made a tool that is Mi-Discoverer for the detection of miRNAs in human genome. The tools used for the development of software are Microsoft Office Access 2003, the JDK version 1.6.0, BioJava version 1.0, and the NetBeans IDE version 6.0. All already made miRNAs softwares were web based; so the advantage of our project was to make a desktop facility to the user for sequence alignment search with already identified miRNAs of human genome present in the database. The user can also insert and update the newly discovered human miRNA in the database. Mi-Discoverer, a bioinformatics tool successfully identifies human miRNAs based on multiple sequence alignment searches. It's a non redundant database containing a large collection of publicly available human miRNAs. PMID:21364831

  20. A biomedical information system for retrieval and manipulation of NHANES data.

    PubMed

    Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A

    2013-01-01

    The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.

  1. Creating databases for biological information: an introduction.

    PubMed

    Stein, Lincoln

    2013-06-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.

  2. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

    PubMed Central

    Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  3. 21 CFR 203.3 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... acutely ill or injured persons; provision of minimal emergency supplies of drugs to nearby nursing homes...' and distributors' warehouses, chain drug warehouses, and wholesale drug warehouses; independent...

  4. 21 CFR 203.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... acutely ill or injured persons; provision of minimal emergency supplies of drugs to nearby nursing homes...' and distributors' warehouses, chain drug warehouses, and wholesale drug warehouses; independent...

  5. 21 CFR 203.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... acutely ill or injured persons; provision of minimal emergency supplies of drugs to nearby nursing homes...' and distributors' warehouses, chain drug warehouses, and wholesale drug warehouses; independent...

  6. 8. Freight Warehouse, looking east into the east section. Projecting ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    8. Freight Warehouse, looking east into the east section. Projecting into the warehouse space are: (a) A loading dock with slatted walls (b) An office space above it and (c) A toilet and utility area. (b) and (c) are accessed through the Ticket Office. - Curtis Wharf, Freight Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  7. The Data Warehouse: Keeping It Simple. MIT Shares Valuable Lessons Learned from a Successful Data Warehouse Implementation.

    ERIC Educational Resources Information Center

    Thorne, Scott

    2000-01-01

    Explains why the data warehouse is important to the Massachusetts Institute of Technology community, describing its basic functions and technical design points; sharing some non-technical aspects of the school's data warehouse implementation that have proved to be important; examining the importance of proper training in a successful warehouse…

  8. 27 CFR 19.134 - Bonded warehouses not on premises qualified for production of spirits.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... depositors of spirits; (iii) Approximate number of persons to be served from the warehouse; and (iv) Data or... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonded warehouses not on... Location and Use § 19.134 Bonded warehouses not on premises qualified for production of spirits. (a...

  9. sRNAdb: A small non-coding RNA database for gram-positive bacteria

    PubMed Central

    2012-01-01

    Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983

  10. Progress towards a Spacecraft-Associated Microbial Meta-database (SAMM)

    NASA Astrophysics Data System (ADS)

    Mogul, Rakesh; Keagy, Laura; Nava, Argelia; Zerehi, Farah

    The microbial inventories within the assembly facilities for spacecraft represent the primary pool of forward contaminants that may compromise life-detection missions. Accordingly, we are constructing a meta-database of these microorganisms for the purpose of building a bioinformatic resource for planetary protection and astrobiology-related endeavors. Using student-led efforts, the meta-database is being constructed from literature reports and is inclusive of both isolated microorganisms and those solely detected through DNA-based techniques. The Spacecraft-Associated Microbial Meta-database (SAMM) currently includes over 800 entries that are organized using 32 meta-tags involving taxonomy, location of isolation (facility and component), category of characterization (culture and/or genetic), types of characterizations (e.g., culture, 16s rDNA, phylochip, FAME, and DNA hybridization), growth conditions, Gram stain, and general physiological traits (e.g., sporulation, extremotolerance, and respiration properties). Interrogations on the database show that the cleanrooms at Kennedy Space Center (KSC) are ~ 2-fold greater in diversity in bacterial genera when compared to the Jet Propulsion Laboratory (JPL), and that bacteria related to water, plant, and human environments are more often associated with the KSC-specific genera. These results are parallel to those reported in the literature, and hence serve as benchmarks demonstrating the bioinformatic potential of this meta-database. The ultimate plans for SAMM include public availability, expansion through crowdsourcing efforts, and potential use as a companion resource to the culture collections assembled by DSMZ and JPL.

  11. Bioinformatics and molecular modeling in glycobiology

    PubMed Central

    Schloissnig, Siegfried

    2010-01-01

    The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed. PMID:20364395

  12. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencingmore » projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.« less

  13. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

    PubMed

    Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin

    2017-10-30

    Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.

  14. [Construction and realization of real world integrated data warehouse from HIS on re-evaluation of post-maketing traditional Chinese medicine].

    PubMed

    Zhuang, Yan; Xie, Bangtie; Weng, Shengxin; Xie, Yanming

    2011-10-01

    To construct real world integrated data warehouse on re-evaluation of post-marketing traditional Chinese medicine for the research on key techniques of clinic re-evaluation which mainly includes indication of traditional Chinese medicine, dosage usage, course of treatment, unit medication, combined disease and adverse reaction, which provides data for reviewed research on its safety,availability and economy,and provides foundation for perspective research. The integrated data warehouse extracts and integrate data from HIS by information collection system and data warehouse technique and forms standard structure and data. The further research is on process based on the data. A data warehouse and several sub data warehouses were built, which focused on patients' main records, doctor orders, diseases diagnoses, laboratory results and economic indications in hospital. These data warehouses can provide research data for re-evaluation of post-marketing traditional Chinese medicine, and it has clinical value. Besides, it points out the direction for further research.

  15. Design and applications of a multimodality image data warehouse framework.

    PubMed

    Wong, Stephen T C; Hoo, Kent Soo; Knowlton, Robert C; Laxer, Kenneth D; Cao, Xinhau; Hawkins, Randall A; Dillon, William P; Arenson, Ronald L

    2002-01-01

    A comprehensive data warehouse framework is needed, which encompasses imaging and non-imaging information in supporting disease management and research. The authors propose such a framework, describe general design principles and system architecture, and illustrate a multimodality neuroimaging data warehouse system implemented for clinical epilepsy research. The data warehouse system is built on top of a picture archiving and communication system (PACS) environment and applies an iterative object-oriented analysis and design (OOAD) approach and recognized data interface and design standards. The implementation is based on a Java CORBA (Common Object Request Broker Architecture) and Web-based architecture that separates the graphical user interface presentation, data warehouse business services, data staging area, and backend source systems into distinct software layers. To illustrate the practicality of the data warehouse system, the authors describe two distinct biomedical applications--namely, clinical diagnostic workup of multimodality neuroimaging cases and research data analysis and decision threshold on seizure foci lateralization. The image data warehouse framework can be modified and generalized for new application domains.

  16. Design and Applications of a Multimodality Image Data Warehouse Framework

    PubMed Central

    Wong, Stephen T.C.; Hoo, Kent Soo; Knowlton, Robert C.; Laxer, Kenneth D.; Cao, Xinhau; Hawkins, Randall A.; Dillon, William P.; Arenson, Ronald L.

    2002-01-01

    A comprehensive data warehouse framework is needed, which encompasses imaging and non-imaging information in supporting disease management and research. The authors propose such a framework, describe general design principles and system architecture, and illustrate a multimodality neuroimaging data warehouse system implemented for clinical epilepsy research. The data warehouse system is built on top of a picture archiving and communication system (PACS) environment and applies an iterative object-oriented analysis and design (OOAD) approach and recognized data interface and design standards. The implementation is based on a Java CORBA (Common Object Request Broker Architecture) and Web-based architecture that separates the graphical user interface presentation, data warehouse business services, data staging area, and backend source systems into distinct software layers. To illustrate the practicality of the data warehouse system, the authors describe two distinct biomedical applications—namely, clinical diagnostic workup of multimodality neuroimaging cases and research data analysis and decision threshold on seizure foci lateralization. The image data warehouse framework can be modified and generalized for new application domains. PMID:11971885

  17. 15. FIRST FLOOR WAREHOUSE SPACE, SHOWING COLUMN / BEAM CONNECTION. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    15. FIRST FLOOR WAREHOUSE SPACE, SHOWING COLUMN / BEAM CONNECTION. VIEW TO SOUTHWEST. - Commercial & Industrial Buildings, Dubuque Seed Company Warehouse, 169-171 Iowa Street, Dubuque, Dubuque County, IA

  18. Bioinformatical and in vitro approaches to essential oil-induced matrix metalloproteinase inhibition.

    PubMed

    Zeidán-Chuliá, Fares; Rybarczyk-Filho, José L; Gursoy, Mervi; Könönen, Eija; Uitto, Veli-Jukka; Gursoy, Orhan V; Cakmakci, Lutfu; Moreira, José C F; Gursoy, Ulvi K

    2012-06-01

    Essential oils carry diverse antimicrobial and anti-enzymatic properties. Matrix metalloproteinase (MMP) inhibition characteristics of Salvia fruticosa Miller (Labiatae), Myrtus communis Linnaeus (Myrtaceae), Juniperus communis Linnaeus (Cupressaceae), and Lavandula stoechas Linnaeus (Labiatae) essential oils were evaluated. Chemical compositions of the essential oils were analyzed by gas chromatography-mass spectrometry (GC-MS). Bioinformatical database analysis was performed by STRING 9.0 and STITCH 2.0 databases, and ViaComplex software. Antibacterial activity of essential oils against periodontopathogens was tested by the disc diffusion assay and the agar dilution method. Cellular proliferation and cytotoxicity were determined by commercial kits. MMP-2 and MMP-9 activities were measured by zymography. Bioinformatical database analyses, under a score of 0.4 (medium) and a prior correction of 0.0, gave rise to a model of protein (MMPs and tissue inhibitors of metalloproteinases) vs. chemical (essential oil components) interaction network; where MMPs and essential oil components interconnected through interaction with hydroxyl radicals, molecular oxygen, and hydrogen peroxide. Components from L. stoechas potentially displayed a higher grade of interaction with MMP-2 and -9. Although antibacterial and growth inhibitory effects of essential oils on the tested periodontopathogens were limited, all of them inhibited MMP-2 in vitro at concentrations of 1 and 5 µL/mL. Moreover, same concentrations of M. communis and L. stoechas also inhibited MMP-9. MMP-inhibiting concentrations of essential oils were not cytotoxic against keratinocytes. We propose essential oils of being useful therapeutic agents as MMP inhibitors through a mechanism possibly based on their antioxidant potential.

  19. Opportunities and challenges provided by cloud repositories for bioinformatics-enabled drug discovery.

    PubMed

    Dalpé, Gratien; Joly, Yann

    2014-09-01

    Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.

  20. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species ▿ ‡ #

    PubMed Central

    Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.

    2011-01-01

    Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772

  1. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  2. An epidemiological modeling and data integration framework.

    PubMed

    Pfeifer, B; Wurz, M; Hanser, F; Seger, M; Netzer, M; Osl, M; Modre-Osprian, R; Schreier, G; Baumgartner, C

    2010-01-01

    In this work, a cellular automaton software package for simulating different infectious diseases, storing the simulation results in a data warehouse system and analyzing the obtained results to generate prediction models as well as contingency plans, is proposed. The Brisbane H3N2 flu virus, which has been spreading during the winter season 2009, was used for simulation in the federal state of Tyrol, Austria. The simulation-modeling framework consists of an underlying cellular automaton. The cellular automaton model is parameterized by known disease parameters and geographical as well as demographical conditions are included for simulating the spreading. The data generated by simulation are stored in the back room of the data warehouse using the Talend Open Studio software package, and subsequent statistical and data mining tasks are performed using the tool, termed Knowledge Discovery in Database Designer (KD3). The obtained simulation results were used for generating prediction models for all nine federal states of Austria. The proposed framework provides a powerful and easy to handle interface for parameterizing and simulating different infectious diseases in order to generate prediction models and improve contingency plans for future events.

  3. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad

    Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologiesmore » in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics into courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.« less

  4. 10. WEST FRONT AND SOUTH SIDE OF WAREHOUSE. VIEW TO ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    10. WEST FRONT AND SOUTH SIDE OF WAREHOUSE. VIEW TO NORTH. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  5. 17. SECOND FLOOR WAREHOUSE SPACE, SHOWING COLUMN AND BEAM CONNECTION. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    17. SECOND FLOOR WAREHOUSE SPACE, SHOWING COLUMN AND BEAM CONNECTION. VIEW TO NORTHEAST. - Commercial & Industrial Buildings, Dubuque Seed Company Warehouse, 169-171 Iowa Street, Dubuque, Dubuque County, IA

  6. Migration from relational to NoSQL database

    NASA Astrophysics Data System (ADS)

    Ghotiya, Sunita; Mandal, Juhi; Kandasamy, Saravanakumar

    2017-11-01

    Data generated by various real time applications, social networking sites and sensor devices is of very huge amount and unstructured, which makes it difficult for Relational database management systems to handle the data. Data is very precious component of any application and needs to be analysed after arranging it in some structure. Relational databases are only able to deal with structured data, so there is need of NoSQL Database management System which can deal with semi -structured data also. Relational database provides the easiest way to manage the data but as the use of NoSQL is increasing it is becoming necessary to migrate the data from Relational to NoSQL databases. Various frameworks has been proposed previously which provides mechanisms for migration of data stored at warehouses in SQL, middle layer solutions which can provide facility of data to be stored in NoSQL databases to handle data which is not structured. This paper provides a literature review of some of the recent approaches proposed by various researchers to migrate data from relational to NoSQL databases. Some researchers proposed mechanisms for the co-existence of NoSQL and Relational databases together. This paper provides a summary of mechanisms which can be used for mapping data stored in Relational databases to NoSQL databases. Various techniques for data transformation and middle layer solutions are summarised in the paper.

  7. ELISA-BASE: An Integrated Bioinformatics Tool for Analyzing and Tracking ELISA Microarray Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Amanda M.; Collett, James L.; Seurynck-Servoss, Shannon L.

    ELISA-BASE is an open-source database for capturing, organizing and analyzing protein enzyme-linked immunosorbent assay (ELISA) microarray data. ELISA-BASE is an extension of the BioArray Soft-ware Environment (BASE) database system, which was developed for DNA microarrays. In order to make BASE suitable for protein microarray experiments, we developed several plugins for importing and analyzing quantitative ELISA microarray data. Most notably, our Protein Microarray Analysis Tool (ProMAT) for processing quantita-tive ELISA data is now available as a plugin to the database.

  8. 19. DETAIL OF FIRST FLOOR WAREHOUSE, SHOWING ROOF TRUSS. VIEW ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    19. DETAIL OF FIRST FLOOR WAREHOUSE, SHOWING ROOF TRUSS. VIEW TO EAST. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  9. 11. SOUTH SIDE OF WAREHOUSE, WITH LOADING DOCK IN FOREGROUND. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    11. SOUTH SIDE OF WAREHOUSE, WITH LOADING DOCK IN FOREGROUND. VIEW TO NORTHWEST. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  10. 6. Cement and Plaster Warehouse, interior. View looking south. Original ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. Cement and Plaster Warehouse, interior. View looking south. Original wood roof truss can be seen at upper left. - Curtis Wharf, Cement & Plaster Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  11. 4. Cement and Plaster Warehouse, southeast corner, showing alterations; pent ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    4. Cement and Plaster Warehouse, southeast corner, showing alterations; pent roof, window and door openings, siding, brick foundation sheathing. - Curtis Wharf, Cement & Plaster Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  12. 3. Cement and Plaster Warehouse, north facade. Loading ramp on ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Cement and Plaster Warehouse, north facade. Loading ramp on the right. Utility building, intrusion, on the far right. - Curtis Wharf, Cement & Plaster Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  13. 9. INTERIOR, WAREHOUSE SPACE AT EAST END OF BUILDING, CAMERA ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    9. INTERIOR, WAREHOUSE SPACE AT EAST END OF BUILDING, CAMERA FACING NORTHEAST. - U.S. Coast Guard Support Center Alameda, Warehouse, Spencer Road & Icarrus Drive, Coast Guard Island, Alameda, Alameda County, CA

  14. View of steel warehouses, building 710 north sidewalk; camera facing ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses, building 710 north sidewalk; camera facing east. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  15. The GMOD Drupal Bioinformatic Server Framework

    PubMed Central

    Papanicolaou, Alexie; Heckel, David G.

    2010-01-01

    Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988

  16. The Importance of Biological Databases in Biological Discovery.

    PubMed

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  17. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services.

    PubMed

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2014-12-12

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. Copyright © 2014 John Wiley & Sons, Inc.

  18. NEIBank: Genomics and bioinformatics resources for vision research

    PubMed Central

    Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David

    2008-01-01

    NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525

  19. 15. Photocopy of photograph (Original print, Wallie V. Funk Collection.) ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    15. Photocopy of photograph (Original print, Wallie V. Funk Collection.) Photographer unknown. Published in the Anacortes American, 12 October 1911; 'Plant of the Anacortes Ice Company and Curtis Dock.' Photograph probably earlier. View looking northwest, left to right; Cement and Plaster Warehouse; Ice Plant (with towers); Cold Storage Warehouse; Freight Warehouse; a warehouse; and early ticket office. - Curtis Wharf, O & Second Streets, Anacortes, Skagit County, WA

  20. 12. EAST REAR OF OFFICE BUILDING (RIGHT FOREGROUND) AND WAREHOUSE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    12. EAST REAR OF OFFICE BUILDING (RIGHT FOREGROUND) AND WAREHOUSE (LEFT BACKGROUND). VIEW TO SOUTH. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  1. 2. View looking northeast at Dixie Cotton Mill warehouses. Note ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    2. View looking northeast at Dixie Cotton Mill warehouses. Note firestops between sections of the building to prevent fire from spreading. - Dixie Cotton Mill, Warehouses, 710 Greenville Street, La Grange, Troup County, GA

  2. View of steel warehouses (building 710 second in on right); ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses (building 710 second in on right); camera facing south. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  3. View of steel warehouses (building 710 second in on left); ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses (building 710 second in on left); camera facing west. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  4. Bioinformatics on the cloud computing platform Azure.

    PubMed

    Shanahan, Hugh P; Owen, Anne M; Harrison, Andrew P

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.

  5. Bioinformatics on the Cloud Computing Platform Azure

    PubMed Central

    Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811

  6. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats

    PubMed Central

    Ison, Jon; Kalaš, Matúš; Jonassen, Inge; Bolser, Dan; Uludag, Mahmut; McWilliam, Hamish; Malone, James; Lopez, Rodrigo; Pettifer, Steve; Rice, Peter

    2013-01-01

    Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. Contact: jison@ebi.ac.uk PMID:23479348

  7. An efficiency improvement in warehouse operation using simulation analysis

    NASA Astrophysics Data System (ADS)

    Samattapapong, N.

    2017-11-01

    In general, industry requires an efficient system for warehouse operation. There are many important factors that must be considered when designing an efficient warehouse system. The most important is an effective warehouse operation system that can help transfer raw material, reduce costs and support transportation. By all these factors, researchers are interested in studying about work systems and warehouse distribution. We start by collecting the important data for storage, such as the information on products, information on size and location, information on data collection and information on production, and all this information to build simulation model in Flexsim® simulation software. The result for simulation analysis found that the conveyor belt was a bottleneck in the warehouse operation. Therefore, many scenarios to improve that problem were generated and testing through simulation analysis process. The result showed that an average queuing time was reduced from 89.8% to 48.7% and the ability in transporting the product increased from 10.2% to 50.9%. Thus, it can be stated that this is the best method for increasing efficiency in the warehouse operation.

  8. 6. GENERAL WAREHOUSE, VIEW TO EAST SHOWING DETAIL OF ELEVATOR ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. GENERAL WAREHOUSE, VIEW TO EAST SHOWING DETAIL OF ELEVATOR DOOR (CENTER), FLANKING PEDESTRIAN ENTRIES, AND LOADING DOCK. - Rosie the Riveter National Historical Park, General Warehouse, 1320 Canal Boulevard, Richmond, Contra Costa County, CA

  9. View of steel warehouses at Gilmore Avenue (building 710 second ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses at Gilmore Avenue (building 710 second in on left); camera facing east. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  10. View of steel warehouses on Ellsberg Drive, building 710 full ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses on Ellsberg Drive, building 710 full building at center; camera facing southeast. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  11. View of steel warehouses (from left: building 807, 808, 809, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses (from left: building 807, 808, 809, 810, 811); camera facing east. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  12. Building a Data Warehouse.

    ERIC Educational Resources Information Center

    Levine, Elliott

    2002-01-01

    Describes how to build a data warehouse, using the Schools Interoperability Framework (www.sifinfo.org), that supports data-driven decision making and complies with the Freedom of Information Act. Provides several suggestions for building and maintaining a data warehouse. (PKP)

  13. The optimal retailer's ordering policies with trade credit financing and limited storage capacity in the supply chain system

    NASA Astrophysics Data System (ADS)

    Yen, Ghi-Feng; Chung, Kun-Jen; Chen, Tzung-Ching

    2012-11-01

    The traditional economic order quantity model assumes that the retailer's storage capacity is unlimited. However, as we all know, the capacity of any warehouse is limited. In practice, there usually exist various factors that induce the decision-maker of the inventory system to order more items than can be held in his/her own warehouse. Therefore, for the decision-maker, it is very practical to determine whether or not to rent other warehouses. In this article, we try to incorporate two levels of trade credit and two separate warehouses (own warehouse and rented warehouse) to establish a new inventory model to help the decision-maker to make the decision. Four theorems are provided to determine the optimal cycle time to generalise some existing articles. Finally, the sensitivity analysis is executed to investigate the effects of the various parameters on ordering policies and annual costs of the inventory system.

  14. Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

    PubMed

    Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

    2018-01-16

    Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. 11. AFRD WAREHOUSE, INTERIOR DETAIL OF RAFTER SUPPORT POST TIMBER ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    11. AFRD WAREHOUSE, INTERIOR DETAIL OF RAFTER SUPPORT POST TIMBER AND METHOD OF BRACING. THE BRACES PENETRATE THE SHEET ROCK, SUGGESTING THAT THESE ARE ORIGINAL. - Minidoka Relocation Center Warehouse, 111 South Fir Street, Shoshone, Lincoln County, ID

  16. 1. Cold Storage Warehouse, east facade. Northeast corner of the ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    1. Cold Storage Warehouse, east facade. Northeast corner of the north facade of the Ice Plant is visible on the left. Far left, the Creamery. - Curtis Wharf, Cold Storage Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  17. 14. DETAIL OF SOUTHWEST FRONT OF WAREHOUSE, SHOWING CORRUGATED PLASTER/ASBESTOS ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    14. DETAIL OF SOUTHWEST FRONT OF WAREHOUSE, SHOWING CORRUGATED PLASTER/ASBESTOS WALLS, WINDOWS AND ROOF. VIEW TO NORTHEAST. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  18. 17 CFR 31.12 - Segregation.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... United States, or unencumbered warehouse receipts for inventory held in approved contract market..., and proceeds from any sale, liquidation or other disposition of obligations or warehouse receipts... to purchase obligations or warehouse receipts of the type described in this paragraph (b) shall...

  19. 17 CFR 31.12 - Segregation.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... United States, or unencumbered warehouse receipts for inventory held in approved contract market..., and proceeds from any sale, liquidation or other disposition of obligations or warehouse receipts... to purchase obligations or warehouse receipts of the type described in this paragraph (b) shall...

  20. Bioinformatics for transporter pharmacogenomics and systems biology: data integration and modeling with UML.

    PubMed

    Yan, Qing

    2010-01-01

    Bioinformatics is the rational study at an abstract level that can influence the way we understand biomedical facts and the way we apply the biomedical knowledge. Bioinformatics is facing challenges in helping with finding the relationships between genetic structures and functions, analyzing genotype-phenotype associations, and understanding gene-environment interactions at the systems level. One of the most important issues in bioinformatics is data integration. The data integration methods introduced here can be used to organize and integrate both public and in-house data. With the volume of data and the high complexity, computational decision support is essential for integrative transporter studies in pharmacogenomics, nutrigenomics, epigenetics, and systems biology. For the development of such a decision support system, object-oriented (OO) models can be constructed using the Unified Modeling Language (UML). A methodology is developed to build biomedical models at different system levels and construct corresponding UML diagrams, including use case diagrams, class diagrams, and sequence diagrams. By OO modeling using UML, the problems of transporter pharmacogenomics and systems biology can be approached from different angles with a more complete view, which may greatly enhance the efforts in effective drug discovery and development. Bioinformatics resources of membrane transporters and general bioinformatics databases and tools that are frequently used in transporter studies are also collected here. An informatics decision support system based on the models presented here is available at http://www.pharmtao.com/transporter . The methodology developed here can also be used for other biomedical fields.

  1. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    PubMed Central

    Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

    2003-01-01

    Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565

  2. What is bioinformatics? A proposed definition and overview of the field.

    PubMed

    Luscombe, N M; Greenbaum, D; Gerstein, M

    2001-01-01

    The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

  3. miRToolsGallery: a tag-based and rankable microRNA bioinformatics resources database portal

    PubMed Central

    Chen, Liang; Heikkinen, Liisa; Wang, ChangLiang; Yang, Yang; Knott, K Emily

    2018-01-01

    Abstract Hundreds of bioinformatics tools have been developed for MicroRNA (miRNA) investigations including those used for identification, target prediction, structure and expression profile analysis. However, finding the correct tool for a specific application requires the tedious and laborious process of locating, downloading, testing and validating the appropriate tool from a group of nearly a thousand. In order to facilitate this process, we developed a novel database portal named miRToolsGallery. We constructed the portal by manually curating > 950 miRNA analysis tools and resources. In the portal, a query to locate the appropriate tool is expedited by being searchable, filterable and rankable. The ranking feature is vital to quickly identify and prioritize the more useful from the obscure tools. Tools are ranked via different criteria including the PageRank algorithm, date of publication, number of citations, average of votes and number of publications. miRToolsGallery provides links and data for the comprehensive collection of currently available miRNA tools with a ranking function which can be adjusted using different criteria according to specific requirements. Database URL: http://www.mirtoolsgallery.org PMID:29688355

  4. PRGdb: a bioinformatics platform for plant resistance gene analysis

    PubMed Central

    Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella

    2010-01-01

    PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694

  5. Using Informatics-, Bioinformatics- and Genomics-Based Approaches for the Molecular Surveillance and Detection of Biothreat Agents

    NASA Astrophysics Data System (ADS)

    Seto, Donald

    The convergence and wealth of informatics, bioinformatics and genomics methods and associated resources allow a comprehensive and rapid approach for the surveillance and detection of bacterial and viral organisms. Coupled with the continuing race for the fastest, most cost-efficient and highest-quality DNA sequencing technology, that is, "next generation sequencing", the detection of biological threat agents by `cheaper and faster' means is possible. With the application of improved bioinformatic tools for the understanding of these genomes and for parsing unique pathogen genome signatures, along with `state-of-the-art' informatics which include faster computational methods, equipment and databases, it is feasible to apply new algorithms to biothreat agent detection. Two such methods are high-throughput DNA sequencing-based and resequencing microarray-based identification. These are illustrated and validated by two examples involving human adenoviruses, both from real-world test beds.

  6. Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers

    NASA Astrophysics Data System (ADS)

    Roche-Lima, Abiel; Thulasiram, Ruppa K.

    2012-02-01

    Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.

  7. Development of Product Availability Monitoring System In Production Unit In Automotive Component Industry

    NASA Astrophysics Data System (ADS)

    Hartono, Rachmad; Raharno, Sri; Yuwana Martawirya, Yatna; Arthaya, Bagus

    2018-03-01

    This paper described a methodology to monitor the availability of products in a production unit in the automotive component industry. Automotive components made are automotive components made through sheet metal working. Raw material coming into production unit in the form of pieces of plates that have a certain size. Raw materials that come stored in the warehouse. Data of raw each material in the warehouse are recorded and stored in a data base system. The material will then undergo several production processes in the production unit. When the material is taken from the warehouse, material data are also recorded and stored in a data base. The data recorded are the amount of material, material type, and date when the material is out of the warehouse. The material coming out of the warehouse is labeled with information related to the production processes that the material must pass. Material out of the warehouse is a product will be made. The products have been completed, are stored in the warehouse products. When the product is entered into the product warehouse, product data is also recorded by scanning the barcode contained on the label. By recording the condition of the product at each stage of production, we can know the availability of the product in a production unit in the form of a raw material, the product being processed and the finished product.

  8. Report: Early Warning Report: Main EPA Headquarters Warehouse in Landover, Maryland, Requires Immediate EPA Attention

    EPA Pesticide Factsheets

    Report #13-P-0272, May 31, 2013. Our initial research at the EPA’s Landover warehouse raised significant concerns with the lack of agency oversight of personal property and warehouse space at the facility.

  9. Exploring of the molecular mechanism of rhinitis via bioinformatics methods

    PubMed Central

    Song, Yufen; Yan, Zhaohui

    2018-01-01

    The aim of this study was to analyze gene expression profiles for exploring the function and regulatory network of differentially expressed genes (DEGs) in pathogenesis of rhinitis by a bioinformatics method. The gene expression profile of GSE43523 was downloaded from the Gene Expression Omnibus database. The dataset contained 7 seasonal allergic rhinitis samples and 5 non-allergic normal samples. DEGs between rhinitis samples and normal samples were identified via the limma package of R. The webGestal database was used to identify enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the DEGs. The differentially co-expressed pairs of the DEGs were identified via the DCGL package in R, and the differential co-expression network was constructed based on these pairs. A protein-protein interaction (PPI) network of the DEGs was constructed based on the Search Tool for the Retrieval of Interacting Genes database. A total of 263 DEGs were identified in rhinitis samples compared with normal samples, including 125 downregulated ones and 138 upregulated ones. The DEGs were enriched in 7 KEGG pathways. 308 differential co-expression gene pairs were obtained. A differential co-expression network was constructed, containing 212 nodes. In total, 148 PPI pairs of the DEGs were identified, and a PPI network was constructed based on these pairs. Bioinformatics methods could help us identify significant genes and pathways related to the pathogenesis of rhinitis. Steroid biosynthesis pathway and metabolic pathways might play important roles in the development of allergic rhinitis (AR). Genes such as CDC42 effector protein 5, solute carrier family 39 member A11 and PR/SET domain 10 might be also associated with the pathogenesis of AR, which provided references for the molecular mechanisms of AR. PMID:29257233

  10. Two-warehouse partial backlogging inventory model for deteriorating items with linear trend in demand under inflationary conditions

    NASA Astrophysics Data System (ADS)

    Jaggi, Chandra K.; Khanna, Aditi; Verma, Priyanka

    2011-07-01

    In today's business transactions, there are various reasons, namely, bulk purchase discounts, re-ordering costs, seasonality of products, inflation induced demand, etc., which force the buyer to order more than the warehouse capacity. Such situations call for additional storage space to store the excess units purchased. This additional storage space is typically a rented warehouse. Inflation plays a very interesting and significant role here: It increases the cost of goods. To safeguard from the rising prices, during the inflation regime, the organisation prefers to keep a higher inventory, thereby increasing the aggregate demand. This additional inventory needs additional storage space, which is facilitated by a rented warehouse. Ignoring the effects of the time value of money and inflation might yield misleading results. In this study, a two-warehouse inventory model with linear trend in demand under inflationary conditions having different rates of deterioration has been developed. Shortages at the owned warehouse are also allowed subject to partial backlogging. The solution methodology provided in the model helps to decide on the feasibility of renting a warehouse. Finally, findings have been illustrated with the help of numerical examples. Comprehensive sensitivity analysis has also been provided.

  11. Credit BG. View looks southwest (236°) at the warehouse's southeast ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Credit BG. View looks southwest (236°) at the warehouse's southeast and northeast facades. This building retains its original World War II era materials and appearance - Edwards Air Force Base, North Base, Warehouse, Second & C Streets, Boron, Kern County, CA

  12. 76 FR 28801 - Agency Information Collection Activities: Bonded Warehouse Regulations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-18

    ... Activities: Bonded Warehouse Regulations AGENCY: U.S. Customs and Border Protection, Department of Homeland... (OMB) for review and approval in accordance with the Paperwork Reduction Act: Bonded Warehouse... appropriate automated, electronic, mechanical, or other technological techniques or other forms of information...

  13. 3. Interior view of section of warehouse building now used ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Interior view of section of warehouse building now used as an opening room. 'Unifloc' machine manufactured by Rieter opens and blends cotton from several bales before processing. - Dixie Cotton Mill, Warehouses, 710 Greenville Street, La Grange, Troup County, GA

  14. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences.

    PubMed

    Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html.

  15. Why are they missing? : Bioinformatics characterization of missing human proteins.

    PubMed

    Elguoshy, Amr; Magdeldin, Sameh; Xu, Bo; Hirao, Yoshitoshi; Zhang, Ying; Kinoshita, Naohiko; Takisawa, Yusuke; Nameta, Masaaki; Yamamoto, Keiko; El-Refy, Ali; El-Fiky, Fawzy; Yamamoto, Tadashi

    2016-10-21

    NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (>30aa) or mapped to different proteins (<9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification. Copyright © 2016. Published by Elsevier B.V.

  16. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  17. A study on building data warehouse of hospital information system.

    PubMed

    Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo

    2011-08-01

    Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

  18. MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.

    PubMed

    Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba

    2011-04-15

    Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.

  19. Data Warehouse Discovery Framework: The Foundation

    NASA Astrophysics Data System (ADS)

    Apanowicz, Cas

    The cost of building an Enterprise Data Warehouse Environment runs usually in millions of dollars and takes years to complete. The cost, as big as it is, is not the primary problem for a given corporation. The risk that all money allocated for planning, design and implementation of the Data Warehouse and Business Intelligence Environment may not bring the result expected, fare out way the cost of entire effort [2,10]. The combination of the two above factors is the main reason that Data Warehouse/Business Intelligence is often single most expensive and most risky IT endeavor for companies [13]. That situation was the main author's inspiration behind founding of Infobright Corp and later on the concept of Data Warehouse Discovery Framework.

  20. CHOgenome.org 2.0: Genome resources and website updates.

    PubMed

    Kremkow, Benjamin G; Baik, Jong Youn; MacDonald, Madolyn L; Lee, Kelvin H

    2015-07-01

    Chinese hamster ovary (CHO) cells are a major host cell line for the production of therapeutic proteins, and CHO cell and Chinese hamster (CH) genomes have recently been sequenced using next-generation sequencing methods. CHOgenome.org was launched in 2011 (version 1.0) to serve as a database repository and to provide bioinformatics tools for the CHO community. CHOgenome.org (version 1.0) maintained GenBank CHO-K1 genome data, identified CHO-omics literature, and provided a CHO-specific BLAST service. Recent major updates to CHOgenome.org (version 2.0) include new sequence and annotation databases for both CHO and CH genomes, a more user-friendly website, and new research tools, including a proteome browser and a genome viewer. CHO cell-line specific sequences and annotations facilitate cell line development opportunities, several of which are discussed. Moving forward, CHOgenome.org will host the increasing amount of CHO-omics data and continue to make useful bioinformatics tools available to the CHO community. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Using Electronic Health Records to Build an Ophthalmologic Data Warehouse and Visualize Patients' Data.

    PubMed

    Kortüm, Karsten U; Müller, Michael; Kern, Christoph; Babenko, Alexander; Mayer, Wolfgang J; Kampik, Anselm; Kreutzer, Thomas C; Priglinger, Siegfried; Hirneiss, Christoph

    2017-06-01

    To develop a near-real-time data warehouse (DW) in an academic ophthalmologic center to gain scientific use of increasing digital data from electronic medical records (EMR) and diagnostic devices. Database development. Specific macular clinic user interfaces within the institutional hospital information system were created. Orders for imaging modalities were sent by an EMR-linked picture-archiving and communications system to the respective devices. All data of 325 767 patients since 2002 were gathered in a DW running on an SQL database. A data discovery tool was developed. An exemplary search for patients with age-related macular degeneration, performed cataract surgery, and at least 10 intravitreal (excluding bevacizumab) injections was conducted. Data related to those patients (3 142 204 diagnoses [including diagnoses from other fields of medicine], 720 721 procedures [eg, surgery], and 45 416 intravitreal injections) were stored, including 81 274 optical coherence tomography measurements. A web-based browsing tool was successfully developed for data visualization and filtering data by several linked criteria, for example, minimum number of intravitreal injections of a specific drug and visual acuity interval. The exemplary search identified 450 patients with 516 eyes meeting all criteria. A DW was successfully implemented in an ophthalmologic academic environment to support and facilitate research by using increasing EMR and measurement data. The identification of eligible patients for studies was simplified. In future, software for decision support can be developed based on the DW and its structured data. The improved classification of diseases and semiautomatic validation of data via machine learning are warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. A Unique Digital Electrocardiographic Repository for the Development of Quantitative Electrocardiography and Cardiac Safety: The Telemetric and Holter ECG Warehouse (THEW)

    PubMed Central

    Couderc, Jean-Philippe

    2010-01-01

    The sharing of scientific data reinforces open scientific inquiry; it encourages diversity of analysis and opinion while promoting new research and facilitating the education of next generations of scientists. In this article, we present an initiative for the development of a repository containing continuous electrocardiographic information and their associated clinical information. This information is shared with the worldwide scientific community in order to improve quantitative electrocardiology and cardiac safety. First, we present the objectives of the initiative and its mission. Then, we describe the resources available in this initiative following three components: data, expertise and tools. The Data available in the Telemetric and Holter ECG Warehouse (THEW) includes continuous ECG signals and associated clinical information. The initiative attracted various academic and private partners whom expertise covers a large list of research arenas related to quantitative electrocardiography; their contribution to the THEW promotes cross-fertilization of scientific knowledge, resources, and ideas that will advance the field of quantitative electrocardiography. Finally, the tools of the THEW include software and servers to access and review the data available in the repository. To conclude, the THEW is an initiative developed to benefit the scientific community and to advance the field of quantitative electrocardiography and cardiac safety. It is a new repository designed to complement the existing ones such as Physionet, the AHA-BIH Arrhythmia Database, and the CSE database. The THEW hosts unique datasets from clinical trials and drug safety studies that, so far, were not available to the worldwide scientific community. PMID:20863512

  3. 4. AFRD WAREHOUSE, WEST SIDE DETAIL OF ALTERED SLIDING DOORS, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    4. AFRD WAREHOUSE, WEST SIDE DETAIL OF ALTERED SLIDING DOORS, FACING EAST. WEATHER COVER OVER RAIL IS ORIGINAL. SHEET METAL SIDING HAS BEEN INSERTED BETWEEN TWO HALVES OF SLIDING DOORS. - Minidoka Relocation Center Warehouse, 111 South Fir Street, Shoshone, Lincoln County, ID

  4. Information Architecture: The Data Warehouse Foundation.

    ERIC Educational Resources Information Center

    Thomas, Charles R.

    1997-01-01

    Colleges and universities are initiating data warehouse projects to provide integrated information for planning and reporting purposes. A survey of 40 institutions with active data warehouse projects reveals the kinds of tools, contents, data cycles, and access currently used. Essential elements of an integrated information architecture are…

  5. Optimal (R, Q) policy and pricing for two-echelon supply chain with lead time and retailer's service-level incomplete information

    NASA Astrophysics Data System (ADS)

    Esmaeili, M.; Naghavi, M. S.; Ghahghaei, A.

    2018-03-01

    Many studies focus on inventory systems to analyze different real-world situations. This paper considers a two-echelon supply chain that includes one warehouse and one retailer with stochastic demand and an up-to-level policy. The retailer's lead time includes the transportation time from the warehouse to the retailer that is unknown to the retailer. On the other hand, the warehouse is unaware of retailer's service level. The relationship between the retailer and the warehouse is modeled based on the Stackelberg game with incomplete information. Moreover, their relationship is presented when the warehouse and the retailer reveal their private information using the incentive strategies. The optimal inventory and pricing policies are obtained using an algorithm based on bi-level programming. Numerical examples, including sensitivity analysis of some key parameters, will compare the results between the Stackelberg models. The results show that information sharing is more beneficial to the warehouse rather than the retailer.

  6. Medical libraries, bioinformatics, and networked information: a coming convergence?

    PubMed

    Lynch, C

    1999-10-01

    Libraries will be changed by technological and social developments that are fueled by information technology, bioinformatics, and networked information. Libraries in highly focused settings such as the health sciences are at a pivotal point in their development as the synthesis of historically diverse and independent information sources transforms health care institutions. Boundaries are breaking down between published literature and research data, between research databases and clinical patient data, and between consumer health information and professional literature. This paper focuses on the dynamics that are occurring with networked information sources and the roles that libraries will need to play in the world of medical informatics in the early twenty-first century.

  7. FY09 Final Report for LDRD Project: Understanding Viral Quasispecies Evolution through Computation and Experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, C

    2009-11-12

    In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.

  8. High dimensional biological data retrieval optimization with NoSQL technology.

    PubMed

    Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike

    2014-01-01

    High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.

  9. High dimensional biological data retrieval optimization with NoSQL technology

    PubMed Central

    2014-01-01

    Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347

  10. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    PubMed

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-04

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. TabSQL: a MySQL tool to facilitate mapping user data to public databases.

    PubMed

    Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng

    2010-06-23

    With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.

  12. TabSQL: a MySQL tool to facilitate mapping user data to public databases

    PubMed Central

    2010-01-01

    Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251

  13. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Warehouse-stored marketing assistance loan collateral... SIMILARLY HANDLED COMMODITIES-MARKETING ASSISTANCE LOANS AND LOAN DEFICIENCY PAYMENTS FOR 2008 THROUGH 2012 Marketing Assistance Loans § 1421.106 Warehouse-stored marketing assistance loan collateral. (a) A commodity...

  14. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Warehouse-stored marketing assistance loan collateral... SIMILARLY HANDLED COMMODITIES-MARKETING ASSISTANCE LOANS AND LOAN DEFICIENCY PAYMENTS FOR 2008 THROUGH 2012 Marketing Assistance Loans § 1421.106 Warehouse-stored marketing assistance loan collateral. (a) A commodity...

  15. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Warehouse-stored marketing assistance loan collateral... SIMILARLY HANDLED COMMODITIES-MARKETING ASSISTANCE LOANS AND LOAN DEFICIENCY PAYMENTS FOR 2008 THROUGH 2012 Marketing Assistance Loans § 1421.106 Warehouse-stored marketing assistance loan collateral. (a) A commodity...

  16. 77 FR 26024 - Agency Information Collection Activities: Bonded Warehouse Proprietor's Submission

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-02

    ... Activities: Bonded Warehouse Proprietor's Submission AGENCY: U.S. Customs and Border Protection, Department... Budget (OMB) for review and approval in accordance with the Paperwork Reduction Act: Bonded Warehouse... information on those who are to respond, including the use of appropriate automated, electronic, mechanical...

  17. 19 CFR 146.64 - Entry for warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... (CONTINUED) FOREIGN TRADE ZONES Transfer of Merchandise From a Zone § 146.64 Entry for warehouse. (a) Foreign... status may not be entered for warehouse from a zone. Merchandise in nonprivileged foreign status... different port. (b) Zone-restricted merchandise. Foreign merchandise in zone-restricted status may be...

  18. 7 CFR 29.2 - Policy statement.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... warehouses which are located beyond the geographical limitation for “designated markets” set forth in § 29.1... to new markets to warehouses which are located beyond the geographical limitation for “designated... services. The extension of tobacco inspection and price support services to new markets to warehouses which...

  19. 19 CFR 19.2 - Applications to bond.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... CUSTOMS WAREHOUSES, CONTAINER STATIONS AND CONTROL OF MERCHANDISE THEREIN General Provisions § 19.2 Applications to bond. (a) Application. An owner or lessee desiring to establish a bonded warehouse facility shall make written application to the director of the port nearest to where the warehouse is located...

  20. 19 CFR 19.2 - Applications to bond.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... CUSTOMS WAREHOUSES, CONTAINER STATIONS AND CONTROL OF MERCHANDISE THEREIN General Provisions § 19.2 Applications to bond. (a) Application. An owner or lessee desiring to establish a bonded warehouse facility shall make written application to the director of the port nearest to where the warehouse is located...

  1. 7 CFR 29.2 - Policy statement.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... warehouses which are located beyond the geographical limitation for “designated markets” set forth in § 29.1... to new markets to warehouses which are located beyond the geographical limitation for “designated... services. The extension of tobacco inspection and price support services to new markets to warehouses which...

  2. Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python

    NASA Astrophysics Data System (ADS)

    Roy, Avipsa; Fouché, Edouard; Rodriguez Morales, Rafael; Moehler, Gregor

    2017-04-01

    As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like geopandas and shapely available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. Ibmdbpy-spatial provides a geospatial extension to the ibmdbpy package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in IBM dashDB, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called Bluemix. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. Ibmdbpy-spatial accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the dashDB spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance IdaDataBase, which uses a middleware API namely - pypyodbc or jaydebeapi to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the IdaGeoDataFrame, with a specific geometry attribute which recognises a planar geometry column in dashDB and 3) Python wrappers for spatial functions like within, distance, area, buffer} and more which dashDB currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known geopandas-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as matplotlib and folium in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough.

  3. A Bioinformatics Module for Use in an Introductory Biology Laboratory

    ERIC Educational Resources Information Center

    Alaie, Adrienne; Teller, Virginia; Qiu, Wei-gang

    2012-01-01

    Since biomedical science has become increasingly data-intensive, acquisition of computational and quantitative skills by science students has become more important. For non-science students, an introduction to biomedical databases and their applications promotes the development of a scientifically literate population. Because typical college…

  4. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  5. 27 CFR 24.126 - Change in proprietorship involving a bonded wine warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... involving a bonded wine warehouse. 24.126 Section 24.126 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU, DEPARTMENT OF THE TREASURY LIQUORS WINE Establishment and Operations Changes Subsequent to Original Establishment § 24.126 Change in proprietorship involving a bonded wine warehouse...

  6. 27 CFR 44.144 - Opening.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... PAYMENT OF TAX, OR WITH DRAWBACK OF TAX Operations by Export Warehouse Proprietors Inventories § 44.144 Opening. An opening inventory shall be made by the export warehouse proprietor at the time of commencing... permit issued under § 44.93. A similar inventory shall be made by the export warehouse proprietor when he...

  7. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  8. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  9. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  10. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  11. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  12. 19 CFR 19.4 - CBP and proprietor responsibility and supervision over warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... inventory category of each article under FIFO procedures. Merchandise covered by a given unique identifier..., quantity counts of goods in warehouse inventories, spot checks of selected warehouse transactions or...) Maintain the inventory control and recordkeeping system in accordance with the provisions of § 19.12 of...

  13. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  14. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  15. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 10 2011-01-01 2011-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  16. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  17. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  18. 19 CFR 19.14 - Materials for use in manufacturing warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...; DEPARTMENT OF THE TREASURY CUSTOMS WAREHOUSES, CONTAINER STATIONS AND CONTROL OF MERCHANDISE THEREIN... statistical information as provided in § 141.61(e) of this chapter. If the merchandise has been imported or... report this information for each warehouse entry represented in the manufacturing process. [28 FR 14763...

  19. Population dynamics of stored maize insect pests in warehouses in two districts of Ghana

    USDA-ARS?s Scientific Manuscript database

    Understanding what insect species are present and their temporal and spatial patterns of distribution is important for developing a successful integrated pest management strategy for food storage in warehouses. Maize in many countries in Africa is stored in bags in warehouses, but little monitoring ...

  20. 27 CFR 44.142 - Records.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2013-04-01 2013-04-01 false Records. 44.142 Section 44... PAYMENT OF TAX, OR WITH DRAWBACK OF TAX Operations by Export Warehouse Proprietors § 44.142 Records. Every export warehouse proprietor must keep in such warehouse complete and concise records, containing the: (a...

  1. 75 FR 71724 - Real Estate Settlement Procedures Act (RESPA): Solicitation of Information on Changes in...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-24

    ... Procedures Act (RESPA): Solicitation of Information on Changes in Warehouse Lending and Other Loan Funding... guidance under RESPA to address possible changes in warehouse lending and other financing mechanisms used... in recent years, and especially on how warehouse lending currently operates within residential real...

  2. Photocopy of drawing (original drawing of Signal & Ordnance Warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Signal & Ordnance Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General)TRUSS DETAILS - MacDill Air Force Base, Signal & Ordnance Warehouse, 7620 Hanger Loop Drive, Tampa, Hillsborough County, FL

  3. 76 FR 40409 - Self-Regulatory Organizations; National Securities Clearing Corporation; Notice of Filing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-08

    ... Effectiveness of Proposed Rule Change Relating to Fees Associated With the Obligation Warehouse Service July 1... related to the new Obligation Warehouse service. II. Self-Regulatory Organization's Statement of Purpose... for NSCC's Obligation Warehouse service, a new functionality that was designed to enhance and replace...

  4. 17 CFR 31.8 - Cover of leverage contracts.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... this section. (2) Permissible cover for a long leverage contract is limited to: (i) Warehouse receipts... and accrued interest on any loan against such warehouse receipts does not exceed 70 percent of the current market value of the commodity represented by each receipt. (ii) Warehouse receipts for gold...

  5. The importance of data warehouses for physician executives.

    PubMed

    Ruffin, M

    1994-11-01

    Soon, most physicians will begin to learn about data warehouses and clinical and financial data about their patients stored in them. What is a data warehouse? Why are we seeing their emergence in health care only now? How does a hospital, or group practice, or health plan acquire or create a data warehouse? Who should be responsible for it, and what sort of training is needed by those in charge of using it for the edification of the sponsoring organization? I'll try to answer these questions in this article.

  6. Better bioinformatics through usability analysis.

    PubMed

    Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia

    2009-02-01

    Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.

  7. Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sulakhe, D.; Rodriguez, A.; Wilde, M.

    2008-03-01

    Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less

  8. Models@Home: distributed computing in bioinformatics using a screensaver based approach.

    PubMed

    Krieger, Elmar; Vriend, Gert

    2002-02-01

    Due to the steadily growing computational demands in bioinformatics and related scientific disciplines, one is forced to make optimal use of the available resources. A straightforward solution is to build a network of idle computers and let each of them work on a small piece of a scientific challenge, as done by Seti@Home (http://setiathome.berkeley.edu), the world's largest distributed computing project. We developed a generally applicable distributed computing solution that uses a screensaver system similar to Seti@Home. The software exploits the coarse-grained nature of typical bioinformatics projects. Three major considerations for the design were: (1) often, many different programs are needed, while the time is lacking to parallelize them. Models@Home can run any program in parallel without modifications to the source code; (2) in contrast to the Seti project, bioinformatics applications are normally more sensitive to lost jobs. Models@Home therefore includes stringent control over job scheduling; (3) to allow use in heterogeneous environments, Linux and Windows based workstations can be combined with dedicated PCs to build a homogeneous cluster. We present three practical applications of Models@Home, running the modeling programs WHAT IF and YASARA on 30 PCs: force field parameterization, molecular dynamics docking, and database maintenance.

  9. Shotgun proteomic analysis of Emiliania huxleyi, a marine phytoplankton species of major biogeochemical importance.

    PubMed

    Jones, Bethan M; Edwards, Richard J; Skipp, Paul J; O'Connor, C David; Iglesias-Rodriguez, M Debora

    2011-06-01

    Emiliania huxleyi is a unicellular marine phytoplankton species known to play a significant role in global biogeochemistry. Through the dual roles of photosynthesis and production of calcium carbonate (calcification), carbon is transferred from the atmosphere to ocean sediments. Almost nothing is known about the molecular mechanisms that control calcification, a process that is tightly regulated within the cell. To initiate proteomic studies on this important and phylogenetically remote organism, we have devised efficient protein extraction protocols and developed a bioinformatics pipeline that allows the statistically robust assignment of proteins from MS/MS data using preexisting EST sequences. The bioinformatics tool, termed BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics using ESTs), is fully automated and was used to search against data generated from three strains. BUDAPEST increased the number of identifications over standard protein database searches from 37 to 99 proteins when data were amalgamated. Proteins involved in diverse cellular processes were uncovered. For example, experimental evidence was obtained for a novel type I polyketide synthase and for various photosystem components. The proteomic and bioinformatic approaches developed in this study are of wider applicability, particularly to the oceanographic community where genomic sequence data for species of interest are currently scarce.

  10. [Development of a microbiology data warehouse (Akita-ReNICS) for networking hospitals in a medical region].

    PubMed

    Ueki, Shigeharu; Kayaba, Hiroyuki; Tomita, Noriko; Kobayashi, Noriko; Takahashi, Tomoe; Obara, Toshikage; Takeda, Masahide; Moritoki, Yuki; Itoga, Masamichi; Ito, Wataru; Ohsaga, Atsushi; Kondoh, Katsuyuki; Chihara, Junichi

    2011-04-01

    The active involvement of hospital laboratory in surveillance is crucial to the success of nosocomial infection control. The recent dramatic increase of antimicrobial-resistant organisms and their spread into the community suggest that the infection control strategy of independent medical institutions is insufficient. To share the clinical data and surveillance in our local medical region, we developed a microbiology data warehouse for networking hospital laboratories in Akita prefecture. This system, named Akita-ReNICS, is an easy-to-use information management system designed to compare, track, and report the occurrence of antimicrobial-resistant organisms. Participating laboratories routinely transfer their coded and formatted microbiology data to ReNICS server located at Akita University Hospital from their health care system's clinical computer applications over the internet. We established the system to automate the statistical processes, so that the participants can access the server to monitor graphical data in the manner they prefer, using their own computer's browser. Furthermore, our system also provides the documents server, microbiology and antimicrobiotic database, and space for long-term storage of microbiological samples. Akita-ReNICS could be a next generation network for quality improvement of infection control.

  11. A systematic review of administrative and clinical databases of infants admitted to neonatal units.

    PubMed

    Statnikov, Yevgeniy; Ibrahim, Buthaina; Modi, Neena

    2017-05-01

    High quality information, increasingly captured in clinical databases, is a useful resource for evaluating and improving newborn care. We conducted a systematic review to identify neonatal databases, and define their characteristics. We followed a preregistered protocol using MesH terms to search MEDLINE, EMBASE, CINAHL, Web of Science and OVID Maternity and Infant Care Databases for articles identifying patient level databases covering more than one neonatal unit. Full-text articles were reviewed and information extracted on geographical coverage, criteria for inclusion, data source, and maternal and infant characteristics. We identified 82 databases from 2037 publications. Of the country-specific databases there were 39 regional and 39 national. Sixty databases restricted entries to neonatal unit admissions by birth characteristic or insurance cover; 22 had no restrictions. Data were captured specifically for 53 databases; 21 administrative sources; 8 clinical sources. Two clinical databases hold the largest range of data on patient characteristics, USA's Pediatrix BabySteps Clinical Data Warehouse and UK's National Neonatal Research Database. A number of neonatal databases exist that have potential to contribute to evaluating neonatal care. The majority is created by entering data specifically for the database, duplicating information likely already captured in other administrative and clinical patient records. This repetitive data entry represents an unnecessary burden in an environment where electronic patient records are increasingly used. Standardisation of data items is necessary to facilitate linkage within and between countries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  12. ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature.

    PubMed

    McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry B F; Tipton, Keith F

    2007-07-27

    We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at http://www.enzyme-database.org. The data are available for download as SQL and XML files via FTP. ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List.

  13. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  14. Intelligent Data Analysis in the EMERCOM Information System

    NASA Astrophysics Data System (ADS)

    Elena, Sharafutdinova; Tatiana, Avdeenko; Bakaev, Maxim

    2017-01-01

    The paper describes an information system development project for the Russian Ministry of Emergency Situations (MES, whose international operations body is known as EMERCOM), which was attended by the representatives of both the IT industry and the academia. Besides the general description of the system, we put forward OLAP and Data Mining-based approaches towards the intelligent analysis of the data accumulated in the database. In particular, some operational OLAP reports and an example of multi-dimensional information space based on OLAP Data Warehouse are presented. Finally, we outline Data Mining application to support decision-making regarding security inspections planning and results consideration.

  15. The Xeno-glycomics database (XDB): a relational database of qualitative and quantitative pig glycome repertoire.

    PubMed

    Park, Hae-Min; Park, Ju-Hyeong; Kim, Yoon-Woo; Kim, Kyoung-Jin; Jeong, Hee-Jin; Jang, Kyoung-Soon; Kim, Byung-Gee; Kim, Yun-Gon

    2013-11-15

    In recent years, the improvement of mass spectrometry-based glycomics techniques (i.e. highly sensitive, quantitative and high-throughput analytical tools) has enabled us to obtain a large dataset of glycans. Here we present a database named Xeno-glycomics database (XDB) that contains cell- or tissue-specific pig glycomes analyzed with mass spectrometry-based techniques, including a comprehensive pig glycan information on chemical structures, mass values, types and relative quantities. It was designed as a user-friendly web-based interface that allows users to query the database according to pig tissue/cell types or glycan masses. This database will contribute in providing qualitative and quantitative information on glycomes characterized from various pig cells/organs in xenotransplantation and might eventually provide new targets in the α1,3-galactosyltransferase gene-knock out pigs era. The database can be accessed on the web at http://bioinformatics.snu.ac.kr/xdb.

  16. C-A1-03: Considerations in the Design and Use of an Oracle-based Virtual Data Warehouse

    PubMed Central

    Bredfeldt, Christine; McFarland, Lela

    2011-01-01

    Background/Aims The amount of clinical data available for research is growing exponentially. As it grows, increasing the efficiency of both data storage and data access becomes critical. Relational database management systems (rDBMS) such as Oracle are ideal solutions for managing longitudinal clinical data because they support large-scale data storage and highly efficient data retrieval. In addition, they can greatly simplify the management of large data warehouses, including security management and regular data refreshes. However, the HMORN Virtual Data Warehouse (VDW) was originally designed based on SAS datasets, and this design choice has a number of implications for both the design and use of an Oracle-based VDW. From a design standpoint, VDW tables are designed as flat SAS datasets, which do not take full advantage of Oracle indexing capabilities. From a data retrieval standpoint, standard VDW SAS scripts do not take advantage of SAS pass-through SQL capabilities to enable Oracle to perform the processing required to narrow datasets to the population of interest. Methods Beginning in 2009, the research department at Kaiser Permanente in the Mid-Atlantic States (KPMA) has developed an Oracle-based VDW according to the HMORN v3 specifications. In order to take advantage of the strengths of relational databases, KPMA introduced an interface layer to the VDW data, using views to provide access to standardized VDW variables. In addition, KPMA has developed SAS programs that provide access to SQL pass-through processing for first-pass data extraction into SAS VDW datasets for processing by standard VDW scripts. Results We discuss both the design and performance considerations specific to the KPMA Oracle-based VDW. We benchmarked performance of the Oracle-based VDW using both standard VDW scripts and an initial pre-processing layer to evaluate speed and accuracy of data return. Conclusions Adapting the VDW for deployment in an Oracle environment required minor changes to the underlying structure of the data. Further modifications of the underlying data structure would lead to performance enhancements. Maximally efficient data access for standard VDW scripts requires an extra step that involves restricting the data to the population of interest at the data server level prior to standard processing.

  17. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences

    PubMed Central

    Stephens, Susie M.; Chen, Jake Y.; Davidson, Marcel G.; Thomas, Shiby; Trute, Barry M.

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html PMID:15608287

  18. The ESID Online Database network.

    PubMed

    Guzman, D; Veit, D; Knerr, V; Kindle, G; Gathmann, B; Eades-Perner, A M; Grimbacher, B

    2007-03-01

    Primary immunodeficiencies (PIDs) belong to the group of rare diseases. The European Society for Immunodeficiencies (ESID), is establishing an innovative European patient and research database network for continuous long-term documentation of patients, in order to improve the diagnosis, classification, prognosis and therapy of PIDs. The ESID Online Database is a web-based system aimed at data storage, data entry, reporting and the import of pre-existing data sources in an enterprise business-to-business integration (B2B). The online database is based on Java 2 Enterprise System (J2EE) with high-standard security features, which comply with data protection laws and the demands of a modern research platform. The ESID Online Database is accessible via the official website (http://www.esid.org/). Supplementary data are available at Bioinformatics online.

  19. A Data Warehouse Architecture for DoD Healthcare Performance Measurements.

    DTIC Science & Technology

    1999-09-01

    design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse of healthcare metrics. With the DoD healthcare...framework, this thesis defines a methodology to design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse...21 F. INABILITY TO CONDUCT HELATHCARE ANALYSIS

  20. 3. Photocopy of photograph (Original print, Phillip McCracken, courtesy of ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Photocopy of photograph (Original print, Phillip McCracken, courtesy of Bill Mitchell.) Photographer unknown, 1924. Cold Storage Warehouse on the left, north and west facades. On the right, north facade of the Hay and Grain Warehouse. - Curtis Wharf, Cold Storage Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  1. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  2. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  3. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  4. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  5. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  6. 78 FR 59289 - Clarification of Bales Made Available for Shipment by CCC-Approved Warehouses

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-26

    ... for CCC-approved warehouses storing cotton. The amendment would change the definition of Bales Made Available for Shipment (BMAS). CCC-approved cotton warehouses are currently required to report BMAS, among... information about bales available for shipment, benefiting both CCC and the cotton industry. DATES: We will...

  7. 7 CFR 735.110 - Conditions for delivery of agricultural products.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... product stored or handled in the warehouse on a demand made by: (1) The holder of the warehouse receipt... 7 Agriculture 7 2010-01-01 2010-01-01 false Conditions for delivery of agricultural products. 735... ACT Warehouse Licensing § 735.110 Conditions for delivery of agricultural products. (a) In the absence...

  8. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... indicating that: (1) Storage charges through the maturity date have been prepaid; or (2) The producer has... commodity stored in an approved warehouse shall be the later of the following: (1) The date the commodity was received or deposited in the warehouse; (2) The date the storage charges start; or (3) The day...

  9. Photocopy of drawing (original drawing of Signal & Ordnance Warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Signal & Ordnance Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) SECTIONS AND DETAILS - MacDill Air Force Base, Signal & Ordnance Warehouse, 7620 Hanger Loop Drive, Tampa, Hillsborough County, FL

  10. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) END ELEVATION AND SECTIONS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  11. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) SECTIONS AND DETAILS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  12. Photocopy of drawing (original drawing of Signal & Ordnance Warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Signal & Ordnance Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) PLANS, ELEVATIONS, SECTIONS AND DETAILS - MacDill Air Force Base, Signal & Ordnance Warehouse, 7620 Hanger Loop Drive, Tampa, Hillsborough County, FL

  13. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) FRONT AND REAR ELEVATIONS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  14. 22 CFR 124.14 - Exports to warehouses or distribution points outside the United States.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Exports to warehouses or distribution points... to warehouses or distribution points outside the United States. (a) Agreements. Agreements (e.g... military nomenclature, the Federal stock number, nameplate data, and any control numbers under which the...

  15. 77 FR 40539 - Privacy Act of 1974; Implementation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-10

    ..., JUSTICE/FBI- 022, the FBI Data Warehouse System. In this notice of proposed rulemaking, the FBI proposes... FR 53342 (Aug. 31, 2010) and modified at 75 FR 66131 (Oct. 27, 2010) because the Data Warehouse... proposes to exempt the Data Warehouse System, Justice/FBI-022, from certain provisions of the Privacy Act...

  16. 75 FR 18824 - Agency Information Collection Activities: Notice of Intent To Renew Collection 3038-0019, Stocks...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-13

    ... Renew Collection 3038-0019, Stocks of Grain in Licensed Warehouses AGENCY: Commodity Futures Trading...., permitting electronic submission of responses. Stocks of Grain in Licensed Warehouses, OMB Control No. 3038... warehouses regular for delivery to keep records on stocks of commodities and make reports on call by the...

  17. Development of a data warehouse at an academic health system: knowing a place for the first time.

    PubMed

    Dewitt, Jocelyn G; Hampton, Philip M

    2005-11-01

    In 1998, the University of Michigan Health System embarked upon the design, development, and implementation of an enterprise-wide data warehouse, intending to use prioritized business questions to drive its design and implementation. Because of the decentralized nature of the academic health system and the development team's inability to identify and prioritize those institutional business questions, however, a bottom-up approach was used to develop the enterprise-wide data warehouse. Specific important data sets were identified for inclusion, and the technical team designed the system with an enterprise view and architecture rather than as a series of data marts. Using this incremental approach of adding data sets, institutional leaders were able to experience and then further define successful use of the integrated data made available to them. Even as requests for the use and expansion of the data warehouse outstrip the resources assigned for support, the data warehouse has become an integral component of the institution's information management strategy. The authors discuss the approach, process, current status, and successes and failures of the data warehouse.

  18. Automated realtime data import for the i2b2 clinical data warehouse: introducing the HL7 ETL cell.

    PubMed

    Majeed, Raphael W; Röhrig, Rainer

    2012-01-01

    Clinical data warehouses are used to consolidate all available clinical data from one or multiple organizations. They represent an important source for clinical research, quality management and controlling. Since its introduction, the data warehouse i2b2 gathered a large user base in the research community. Yet, little work has been done on the process of importing clinical data into data warehouses using existing standards. In this article, we present a novel approach of utilizing the clinical integration server as data source, commonly available in most hospitals. As information is transmitted through the integration server, the standardized HL7 message is immediately parsed and inserted into the data warehouse. Evaluation of import speeds suggest feasibility of the provided solution for real-time processing of HL7 messages. By using the presented approach of standardized data import, i2b2 can be used as a plug and play data warehouse, without the hurdle of customized import for every clinical information system or electronic medical record. The provided solution is available for download at http://sourceforge.net/projects/histream/.

  19. Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection

    USDA-ARS?s Scientific Manuscript database

    Current advances in sequencing technologies and bioinformatics allow to determine a nearly complete genomic background of rice, a staple food for the poor people. Consequently, comprehensive databases of variation among thousands of varieties is currently being assembled and released. Proper analysi...

  20. Data warehousing in disease management programs.

    PubMed

    Ramick, D C

    2001-01-01

    Disease management programs offer the benefits of lower disease occurrence, improved patient care, and lower healthcare costs. In such programs, the key mechanism used to identify individuals at risk for targeted diseases is the data warehouse. This article surveys recent warehousing techniques from HMOs to map out critical issues relating to the preparation, design, and implementation of a successful data warehouse. Discussions of scope, data cleansing, and storage management are included in depicting warehouse preparation and design; data implementation options are contrasted. Examples are provided of data warehouse execution in disease management programs that identify members with preexisting illnesses, as well as those exhibiting high-risk conditions. The proper deployment of successful data warehouses in disease management programs benefits both the organization and the member. Organizations benefit from decreased medical costs; members benefit through an improved quality of life through disease-specific care.

  1. Protecting privacy in a clinical data warehouse.

    PubMed

    Kong, Guilan; Xiao, Zhichun

    2015-06-01

    Peking University has several prestigious teaching hospitals in China. To make secondary use of massive medical data for research purposes, construction of a clinical data warehouse is imperative in Peking University. However, a big concern for clinical data warehouse construction is how to protect patient privacy. In this project, we propose to use a combination of symmetric block ciphers, asymmetric ciphers, and cryptographic hashing algorithms to protect patient privacy information. The novelty of our privacy protection approach lies in message-level data encryption, the key caching system, and the cryptographic key management system. The proposed privacy protection approach is scalable to clinical data warehouse construction with any size of medical data. With the composite privacy protection approach, the clinical data warehouse can be secure enough to keep the confidential data from leaking to the outside world. © The Author(s) 2014.

  2. Promotion bureau warehouse system design. Case study in University of AA

    NASA Astrophysics Data System (ADS)

    Parwati, N.; Qibtiyah, M.

    2017-12-01

    The warehouse becomes one of the important parts in an industry. By having a good warehousing system, an industry can improve the effectiveness of its performance, so that profits for the company can continue to increase. Meanwhile, if it has a poorly organized warehouse system, it is feared there will be a decrease in the level of effectiveness of the industry itself. In this research, the object was warehousing system in promotion bureau of University AA. To improve the effectiveness of warehousing system, warehouse layout design is done by specifying categories of goods based on the flow of goods in and out of warehouse with ABC analysis method. In addition, the design of information systems to assist in controlling the system to support all the demand for every burreau and department in the university.

  3. Lyme disease testing in children in an endemic area.

    PubMed

    Al-Sharif, Bashar; Hall, Matthew C

    2011-10-01

    The purpose of this study was to determine clinician adherence to recommendations regarding diagnostic testing for Lyme disease (LD). The specific aims were to determine the rate of inappropriate test ordering for a diagnosis of erythema migrans and tack of confirmatory test ordering for positive LD screening tests. Using the data warehouse of Marshfield Clinic Research Foundation's Bioinformatics Research Center, cases were identified from 2002 through 2007. A retrospective chart abstraction was performed using Marshfield Clinic's electronic medical record. The study involved children (<19 years old). In 57% of cases, LD testing occurred after a clinical diagnosis of erythema migrans was made. Patients with any symptom in addition to erythema migrans were more likely to have testing (odds ratio (OR) = 3.52, 1.75-7.08). A positive LD screening test was not confirmed 24% of the time. Lack of ordering confirmatory testing was not associated with any clinical factors or site of the evaluation. This study found that some clinicians in an LD-endemic area do not follow guidelines for diagnosing children suspected to have Lyme disease.

  4. Tcof1-Related Molecular Networks in Treacher Collins Syndrome.

    PubMed

    Dai, Jiewen; Si, Jiawen; Wang, Minjiao; Huang, Li; Fang, Bing; Shi, Jun; Wang, Xudong; Shen, Guofang

    2016-09-01

    Treacher Collins syndrome (TCS) is a rare, autosomal-dominant disorder characterized by craniofacial deformities, and is primarily caused by mutations in the Tcof1 gene. This article was aimed to perform a comprehensive literature review and systematic bioinformatic analysis of Tcof1-related molecular networks in TCS. First, the up- and down-regulated genes in Tcof1 heterozygous haploinsufficient mutant mice embryos and Tcof1 knockdown and Tcof1 over-expressed neuroblastoma N1E-115 cells were obtained from the Gene Expression Omnibus database. The GeneDecks database was used to calculate the 500 genes most closely related to Tcof1. Then, the relationships between 4 gene sets (a predicted set and sets comparing the wildtype with the 3 Gene Expression Omnibus datasets) were analyzed using the DAVID, GeneMANIA and STRING databases. The analysis results showed that the Tcof1-related genes were enriched in various biological processes, including cell proliferation, apoptosis, cell cycle, differentiation, and migration. They were also enriched in several signaling pathways, such as the ribosome, p53, cell cycle, and WNT signaling pathways. Additionally, these genes clearly had direct or indirect interactions with Tcof1 and between each other. Literature review and bioinformatic analysis finds imply that special attention should be given to these pathways, as they may offer target points for TCS therapies.

  5. Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information.

    PubMed

    Tasleem, Munazzah; Ishrat, Romana; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2016-01-01

    The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com. Copyright © 2015 King Saud Bin Abdulaziz University for Health Sciences. Published by Elsevier Ltd. All rights reserved.

  6. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  7. Bioinformatics analysis for evaluation of the diagnostic potentialities of miR-19b, -125b and -205 as liquid biopsy markers of prostate cancer

    NASA Astrophysics Data System (ADS)

    Bryzgunova, O. E.; Lekchnov, E. A.; Zaripov, M. M.; Yurchenko, Yu. B.; Yarmoschuk, S. V.; Pashkovskaya, O. A.; Rykova, E. Yu.; Zheravin, A. A.; Laktionov, P. P.

    2017-09-01

    Presence of tumor-derived cell-free miRNA in biological fluids as well as simplicity and robustness of cell-free miRNA quantification makes them suitable markers for cancer diagnostics. Based on previously published data demonstrating diagnostic potentialities of miR-205 in blood and miR-19b as well as miR-125b in urine of prostate cancer patients, bioinformatics analysis was carried out to follow their involvement in prostate cancer development and select additional miRNA-markers for prostate cancer diagnostics. Studied miRNAs are involved in different signaling pathways and regulate a number of genes involved in cancer development. Five of their targets (CCND1, BRAF, CCNE1, CCNE2, RAF1), according to the STRING database, act as part of the same signaling pathway. RAF1 is regulated by miR-19b and miR-125b, and it was shown to be involved in prostate cancer development by DIANA and STRING databases. Thus, other microRNAs regulating RAF1 expression such as miR-16, -195, -497, and -7 (suggested by DIANA, TargetScan, MiRTarBase and miRDB databases) can potentially be regarded as prostate cancer markers.

  8. Application of Information-Theoretic Data Mining Techniques in a National Ambulatory Practice Outcomes Research Network

    PubMed Central

    Wright, Adam; Ricciardi, Thomas N.; Zwick, Martin

    2005-01-01

    The Medical Quality Improvement Consortium data warehouse contains de-identified data on more than 3.6 million patients including their problem lists, test results, procedures and medication lists. This study uses reconstructability analysis, an information-theoretic data mining technique, on the MQIC data warehouse to empirically identify risk factors for various complications of diabetes including myocardial infarction and microalbuminuria. The risk factors identified match those risk factors identified in the literature, demonstrating the utility of the MQIC data warehouse for outcomes research, and RA as a technique for mining clinical data warehouses. PMID:16779156

  9. A Simulation Modeling Approach Method Focused on the Refrigerated Warehouses Using Design of Experiment

    NASA Astrophysics Data System (ADS)

    Cho, G. S.

    2017-09-01

    For performance optimization of Refrigerated Warehouses, design parameters are selected based on the physical parameters such as number of equipment and aisles, speeds of forklift for ease of modification. This paper provides a comprehensive framework approach for the system design of Refrigerated Warehouses. We propose a modeling approach which aims at the simulation optimization so as to meet required design specifications using the Design of Experiment (DOE) and analyze a simulation model using integrated aspect-oriented modeling approach (i-AOMA). As a result, this suggested method can evaluate the performance of a variety of Refrigerated Warehouses operations.

  10. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2010-04-01 2010-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  11. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2011-04-01 2011-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  12. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2013-04-01 2013-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  13. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2012-04-01 2011-04-01 true Articles in a warehouse. 46... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  14. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2014-04-01 2014-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  15. 8. AFRD WAREHOUSE, EAST SIDE DETAIL SHOWS CONNECTION OF LEANTO ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    8. AFRD WAREHOUSE, EAST SIDE DETAIL SHOWS CONNECTION OF LEAN-TO TO WALL. FACING WEST. NOTE THE PROFILE OF THE METAL AWNING ON SOUTH SIDE. ELECTRICAL CONDUIT AND OTHER SERVICES PENETRATE WALL. POLE SECURED WITH TRIANGULAR BRACES AT CORNER IS COMMUNICATION POLE. - Minidoka Relocation Center Warehouse, 111 South Fir Street, Shoshone, Lincoln County, ID

  16. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) FIRST FLOOR PLAN AND DOOR DETAILS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  17. Photocopy of drawing (original drawing of Q.M. Warehouse in possession ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) PLANS, ELEVATIONS, SECTIONS, AND ELECTRICAL DETAILS - MacDill Air Force Base, Quartermaster Warehouse, 7605 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  18. Biopython: freely available Python tools for computational molecular biology and bioinformatics

    PubMed Central

    Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.

    2009-01-01

    Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk. PMID:19304878

  19. Bringing the human genome and the revolution in bioinformatics to the medical school classroom: a case report from Washington University School of Medicine.

    PubMed

    Magee, J; Gordon, J I; Whelan, A

    2001-08-01

    The human genome project is revolutionizing medical research and the practice of clinical medicine. To understand and participate in this revolution, physicians must be fluent in human genomics and bioinformatics. At Washington University School of Medicine (WUSM), the authors designed a module for teaching these skills to first-year students. The module uses clinical cases as a platform for accessing information stored in GenBank, Online Mendelian Inheritance in Man (OMIM), and PubMed databases at the National Center for Biotechnology Information (NCBI). This module, which is also designed to reinforce problem-solving skills, has been integrated into WUSM's first-year medical genetics course.

  20. AphidBase: A centralized bioinformatic resource for annotation of the pea aphid genome

    PubMed Central

    Legeai, Fabrice; Shigenobu, Shuji; Gauthier, Jean-Pierre; Colbourne, John; Rispe, Claude; Collin, Olivier; Richards, Stephen; Wilson, Alex C. C.; Tagu, Denis

    2015-01-01

    AphidBase is a centralized bioinformatic resource that was developed to facilitate community annotation of the pea aphid genome by the International Aphid Genomics Consortium (IAGC). The AphidBase Information System designed to organize and distribute genomic data and annotations for a large international community was constructed using open source software tools from the Generic Model Organism Database (GMOD). The system includes Apollo and GBrowse utilities as well as a wiki, blast search capabilities and a full text search engine. AphidBase strongly supported community cooperation and coordination in the curation of gene models during community annotation of the pea aphid genome. AphidBase can be accessed at http://www.aphidbase.com. PMID:20482635

  1. PCDDB: new developments at the Protein Circular Dichroism Data Bank.

    PubMed

    Whitmore, Lee; Miles, Andrew John; Mavridis, Lazaros; Janes, Robert W; Wallace, B A

    2017-01-04

    The Protein Circular Dichroism Data Bank (PCDDB) has been in operation for more than 5 years as a public repository for archiving circular dichroism spectroscopic data and associated bioinformatics and experimental metadata. Since its inception, many improvements and new developments have been made in data display, searching algorithms, data formats, data content, auxillary information, and validation techniques, as well as, of course, an increase in the number of holdings. It provides a site (http://pcddb.cryst.bbk.ac.uk) for authors to deposit experimental data as well as detailed information on methods and calculations associated with published work. It also includes links for each entry to bioinformatics databases. The data are freely available to accessors either as single files or as complete data bank downloads. The PCDDB has found broad usage by the structural biology, bioinformatics, analytical and pharmaceutical communities, and has formed the basis for new software and methods developments. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. HIPPI: highly accurate protein family classification with ensembles of HMMs.

    PubMed

    Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

    2016-11-11

    Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  3. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  4. A Multidimensional Data Warehouse for Community Health Centers

    PubMed Central

    Kunjan, Kislaya; Toscos, Tammy; Turkcan, Ayten; Doebbeling, Brad N.

    2015-01-01

    Community health centers (CHCs) play a pivotal role in healthcare delivery to vulnerable populations, but have not yet benefited from a data warehouse that can support improvements in clinical and financial outcomes across the practice. We have developed a multidimensional clinic data warehouse (CDW) by working with 7 CHCs across the state of Indiana and integrating their operational, financial and electronic patient records to support ongoing delivery of care. We describe in detail the rationale for the project, the data architecture employed, the content of the data warehouse, along with a description of the challenges experienced and strategies used in the development of this repository that may help other researchers, managers and leaders in health informatics. The resulting multidimensional data warehouse is highly practical and is designed to provide a foundation for wide-ranging healthcare data analytics over time and across the community health research enterprise. PMID:26958297

  5. A Clinical Data Warehouse Based on OMOP and i2b2 for Austrian Health Claims Data.

    PubMed

    Rinner, Christoph; Gezgin, Deniz; Wendl, Christopher; Gall, Walter

    2018-01-01

    To develop simulation models for healthcare related questions clinical data can be reused. Develop a clinical data warehouse to harmonize different data sources in a standardized manner and get a reproducible interface for clinical data reuse. The Kimball life cycle for the development of data warehouse was used. The development is split into the technical, the data and the business intelligence pathway. Sample data was persisted in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The i2b2 clinical data warehouse tools were used to query the OMOP CDM by applying the new i2b2 multi-fact table feature. A clinical data warehouse was set up and sample data, data dimensions and ontologies for Austrian health claims data were created. The ability of the standardized data access layer to create and apply simulation models will be evaluated next.

  6. A Multidimensional Data Warehouse for Community Health Centers.

    PubMed

    Kunjan, Kislaya; Toscos, Tammy; Turkcan, Ayten; Doebbeling, Brad N

    2015-01-01

    Community health centers (CHCs) play a pivotal role in healthcare delivery to vulnerable populations, but have not yet benefited from a data warehouse that can support improvements in clinical and financial outcomes across the practice. We have developed a multidimensional clinic data warehouse (CDW) by working with 7 CHCs across the state of Indiana and integrating their operational, financial and electronic patient records to support ongoing delivery of care. We describe in detail the rationale for the project, the data architecture employed, the content of the data warehouse, along with a description of the challenges experienced and strategies used in the development of this repository that may help other researchers, managers and leaders in health informatics. The resulting multidimensional data warehouse is highly practical and is designed to provide a foundation for wide-ranging healthcare data analytics over time and across the community health research enterprise.

  7. Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures.

    PubMed

    Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka

    2015-11-01

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety. 2015 FRAME.

  8. Designing an End-to-End System for Data Storage, Analysis, and Visualization for an Urban Environmental Observatory

    NASA Astrophysics Data System (ADS)

    McGuire, M. P.; Welty, C.; Gangopadhyay, A.; Karabatis, G.; Chen, Z.

    2006-05-01

    The urban environment is formed by complex interactions between natural and human dominated systems, the study of which requires the collection and analysis of very large datasets that span many disciplines. Recent advances in sensor technology and automated data collection have improved the ability to monitor urban environmental systems and are making the idea of an urban environmental observatory a reality. This in turn has created a number of potential challenges in data management and analysis. We present the design of an end-to-end system to store, analyze, and visualize data from a prototype urban environmental observatory based at the Baltimore Ecosystem Study, a National Science Foundation Long Term Ecological Research site (BES LTER). We first present an object-relational design of an operational database to store high resolution spatial datasets as well as data from sensor networks, archived data from the BES LTER, data from external sources such as USGS NWIS, EPA Storet, and metadata. The second component of the system design includes a spatiotemporal data warehouse consisting of a data staging plan and a multidimensional data model designed for the spatiotemporal analysis of monitoring data. The system design also includes applications for multi-resolution exploratory data analysis, multi-resolution data mining, and spatiotemporal visualization based on the spatiotemporal data warehouse. Also the system design includes interfaces with water quality models such as HSPF, SWMM, and SWAT, and applications for real-time sensor network visualization, data discovery, data download, QA/QC, and backup and recovery, all of which are based on the operational database. The system design includes both internet and workstation-based interfaces. Finally we present the design of a laboratory for spatiotemporal analysis and visualization as well as real-time monitoring of the sensor network.

  9. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    PubMed Central

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  10. SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss

    PubMed Central

    Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia

    2011-01-01

    SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/ PMID:22120661

  11. SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss.

    PubMed

    Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia

    2011-01-01

    SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/

  12. 27 CFR 28.27 - Entry of wine into customs bonded warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Entry of wine into customs... Bonded Warehouses § 28.27 Entry of wine into customs bonded warehouses. Upon filing of the application or notice prescribed by § 28.122(a), wine may be withdrawn from a bonded wine cellar for transfer to any...

  13. Storage Information Management System (SIMS) Spaceflight Hardware Warehousing at Goddard Space Flight Center

    NASA Technical Reports Server (NTRS)

    Kubicko, Richard M.; Bingham, Lindy

    1995-01-01

    Goddard Space Flight Center (GSFC) on site and leased warehouses contain thousands of items of ground support equipment (GSE) and flight hardware including spacecraft, scaffolding, computer racks, stands, holding fixtures, test equipment, spares, etc. The control of these warehouses, and the management, accountability, and control of the items within them, is accomplished by the Logistics Management Division. To facilitate this management and tracking effort, the Logistics and Transportation Management Branch, is developing a system to provide warehouse personnel, property owners, and managers with storage and inventory information. This paper will describe that PC-based system and address how it will improve GSFC warehouse and storage management.

  14. Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution.

    PubMed

    Sebaa, Abderrazak; Chikh, Fatima; Nouicer, Amina; Tari, AbdelKamel

    2018-02-19

    The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data. However, without an efficient system design and architecture, these performances will not be significant and valuable for medical managers. In this paper, we provide a short review of the literature about research issues of traditional data warehouses and we present some important Hadoop-based data warehouses. In addition, a Hadoop-based architecture and a conceptual data model for designing medical Big Data warehouse are given. In our case study, we provide implementation detail of big data warehouse based on the proposed architecture and data model in the Apache Hadoop platform to ensure an optimal allocation of health resources.

  15. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.

    PubMed

    Hallin, Peter F; Ussery, David W

    2004-12-12

    Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.

  16. ETHNOS: A versatile electronic tool for the development and curation of national genetic databases

    PubMed Central

    2010-01-01

    National and ethnic mutation databases (NEMDBs) are emerging online repositories, recording extensive information about the described genetic heterogeneity of an ethnic group or population. These resources facilitate the provision of genetic services and provide a comprehensive list of genomic variations among different populations. As such, they enhance awareness of the various genetic disorders. Here, we describe the features of the ETHNOS software, a simple but versatile tool based on a flat-file database that is specifically designed for the development and curation of NEMDBs. ETHNOS is a freely available software which runs more than half of the NEMDBs currently available. Given the emerging need for NEMDB in genetic testing services and the fact that ETHNOS is the only off-the-shelf software available for NEMDB development and curation, its adoption in subsequent NEMDB development would contribute towards data content uniformity, unlike the diverse contents and quality of the available gene (locus)-specific databases. Finally, we allude to the potential applications of NEMDBs, not only as worldwide central allele frequency repositories, but also, and most importantly, as data warehouses of individual-level genomic data, hence allowing for a comprehensive ethnicity-specific documentation of genomic variation. PMID:20650823

  17. ETHNOS : A versatile electronic tool for the development and curation of national genetic databases.

    PubMed

    van Baal, Sjozef; Zlotogora, Joël; Lagoumintzis, George; Gkantouna, Vassiliki; Tzimas, Ioannis; Poulas, Konstantinos; Tsakalidis, Athanassios; Romeo, Giovanni; Patrinos, George P

    2010-06-01

    National and ethnic mutation databases (NEMDBs) are emerging online repositories, recording extensive information about the described genetic heterogeneity of an ethnic group or population. These resources facilitate the provision of genetic services and provide a comprehensive list of genomic variations among different populations. As such, they enhance awareness of the various genetic disorders. Here, we describe the features of the ETHNOS software, a simple but versatile tool based on a flat-file database that is specifically designed for the development and curation of NEMDBs. ETHNOS is a freely available software which runs more than half of the NEMDBs currently available. Given the emerging need for NEMDB in genetic testing services and the fact that ETHNOS is the only off-the-shelf software available for NEMDB development and curation, its adoption in subsequent NEMDB development would contribute towards data content uniformity, unlike the diverse contents and quality of the available gene (locus)-specific databases. Finally, we allude to the potential applications of NEMDBs, not only as worldwide central allele frequency repositories, but also, and most importantly, as data warehouses of individual-level genomic data, hence allowing for a comprehensive ethnicity-specific documentation of genomic variation.

  18. Tomato Expression Database (TED): a suite of data presentation and analysis tools

    PubMed Central

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150 000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at . PMID:16381976

  19. Tomato Expression Database (TED): a suite of data presentation and analysis tools.

    PubMed

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150,000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at http://ted.bti.cornell.edu.

  20. The Cardiac Safety Research Consortium ECG database.

    PubMed

    Kligfield, Paul; Green, Cynthia L

    2012-01-01

    The Cardiac Safety Research Consortium (CSRC) ECG database was initiated to foster research using anonymized, XML-formatted, digitized ECGs with corresponding descriptive variables from placebo- and positive-control arms of thorough QT studies submitted to the US Food and Drug Administration (FDA) by pharmaceutical sponsors. The database can be expanded to other data that are submitted directly to CSRC from other sources, and currently includes digitized ECGs from patients with genotyped varieties of congenital long-QT syndrome; this congenital long-QT database is also linked to ambulatory electrocardiograms stored in the Telemetric and Holter ECG Warehouse (THEW). Thorough QT data sets are available from CSRC for unblinded development of algorithms for analysis of repolarization and for blinded comparative testing of algorithms developed for the identification of moxifloxacin, as used as a positive control in thorough QT studies. Policies and procedures for access to these data sets are available from CSRC, which has developed tools for statistical analysis of blinded new algorithm performance. A recently approved CSRC project will create a data set for blinded analysis of automated ECG interval measurements, whose initial focus will include comparison of four of the major manufacturers of automated electrocardiographs in the United States. CSRC welcomes application for use of the ECG database for clinical investigation. Copyright © 2012 Elsevier Inc. All rights reserved.

Top