Sample records for warehouse query performance

  1. Improve Performance of Data Warehouse by Query Cache

    NASA Astrophysics Data System (ADS)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  2. WATCHMAN: A Data Warehouse Intelligent Cache Manager

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Shim, Junho; Vingralek, Radek

    1996-01-01

    Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

  3. Benchmarking distributed data warehouse solutions for storing genomic variant information

    PubMed Central

    Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.

    2017-01-01

    Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442

  4. Mass Storage Performance Information System

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter

    2000-01-01

    The purpose of this task is to develop a data warehouse to enable system administrators and their managers to gather information by querying the data logs of the MDSDS. Currently detailed logs capture the activity of the MDSDS internal to the different systems. The elements to be included in the data warehouse are requirements analysis, data cleansing, database design, database population, hardware/software acquisition, data transformation, query and report generation, and data mining.

  5. A weight based genetic algorithm for selecting views

    NASA Astrophysics Data System (ADS)

    Talebian, Seyed H.; Kareem, Sameem A.

    2013-03-01

    Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.

  6. Archetype-based data warehouse environment to enable the reuse of electronic health record data.

    PubMed

    Marco-Ruiz, Luis; Moner, David; Maldonado, José A; Kolstrup, Nils; Bellika, Johan G

    2015-09-01

    The reuse of data captured during health care delivery is essential to satisfy the demands of clinical research and clinical decision support systems. A main barrier for the reuse is the existence of legacy formats of data and the high granularity of it when stored in an electronic health record (EHR) system. Thus, we need mechanisms to standardize, aggregate, and query data concealed in the EHRs, to allow their reuse whenever they are needed. To create a data warehouse infrastructure using archetype-based technologies, standards and query languages to enable the interoperability needed for data reuse. The work presented makes use of best of breed archetype-based data transformation and storage technologies to create a workflow for the modeling, extraction, transformation and load of EHR proprietary data into standardized data repositories. We converted legacy data and performed patient-centered aggregations via archetype-based transformations. Later, specific purpose aggregations were performed at a query level for particular use cases. Laboratory test results of a population of 230,000 patients belonging to Troms and Finnmark counties in Norway requested between January 2013 and November 2014 have been standardized. Test records normalization has been performed by defining transformation and aggregation functions between the laboratory records and an archetype. These mappings were used to automatically generate open EHR compliant data. These data were loaded into an archetype-based data warehouse. Once loaded, we defined indicators linked to the data in the warehouse to monitor test activity of Salmonella and Pertussis using the archetype query language. Archetype-based standards and technologies can be used to create a data warehouse environment that enables data from EHR systems to be reused in clinical research and decision support systems. With this approach, existing EHR data becomes available in a standardized and interoperable format, thus opening a world of possibilities toward semantic or concept-based reuse, query and communication of clinical data. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  7. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  8. A study of the age attribute in a query tool for a clinical data warehouse.

    PubMed

    Scheufele, Elisabeth L; Scheufele, Elisabeth Lee; Dubey, Anil; Dubey, Anil Kumar; Murphy, Shawn N

    2008-11-06

    The RPDR, a clinical data warehouse with a user-friendly Querytool, allows researchers to perform studies on patient data. Currently, the RPDR represents age as the patient's age at the present time, which is problematic in situations where age at the time of the event is more appropriate. We will modify the Querytool to consider this by assessing the perception of age via survey, testing backend query solutions, and developing modifications based on these results.

  9. High Performance Analytics with the R3-Cache

    NASA Astrophysics Data System (ADS)

    Eavis, Todd; Sayeed, Ruhan

    Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.

  10. Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

    PubMed Central

    Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

    2015-01-01

    We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572

  11. Selecting materialized views using random algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi

    2007-04-01

    The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.

  12. A similarity-based data warehousing environment for medical images.

    PubMed

    Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar

    2015-11-01

    A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Use of synthesized data to support complex ad-hoc queries in an enterprise information warehouse: a diabetes use case.

    PubMed

    Rogers, Patrick; Erdal, Selnur; Santangelo, Jennifer; Liu, Jianhua; Schuster, Dara; Kamal, Jyoti

    2008-11-06

    The Ohio State University Medical Center (OSUMC) Information Warehouse (IW) is a comprehensive data warehousing facility incorporating operational, clinical, and biological data sets from multiple enterprise system. It is common for users of the IW to request complex ad-hoc queries that often require significant intervention by data analyst. In response to this challenge, we have designed a workflow that leverages synthesized data elements to support such queries in an more timely, efficient manner.

  14. What Is Spatio-Temporal Data Warehousing?

    NASA Astrophysics Data System (ADS)

    Vaisman, Alejandro; Zimányi, Esteban

    In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.

  15. High dimensional biological data retrieval optimization with NoSQL technology.

    PubMed

    Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike

    2014-01-01

    High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.

  16. High dimensional biological data retrieval optimization with NoSQL technology

    PubMed Central

    2014-01-01

    Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347

  17. Automated population of an i2b2 clinical data warehouse from an openEHR-based data repository.

    PubMed

    Haarbrandt, Birger; Tute, Erik; Marschollek, Michael

    2016-10-01

    Detailed Clinical Model (DCM) approaches have recently seen wider adoption. More specifically, openEHR-based application systems are now used in production in several countries, serving diverse fields of application such as health information exchange, clinical registries and electronic medical record systems. However, approaches to efficiently provide openEHR data to researchers for secondary use have not yet been investigated or established. We developed an approach to automatically load openEHR data instances into the open source clinical data warehouse i2b2. We evaluated query capabilities and the performance of this approach in the context of the Hanover Medical School Translational Research Framework (HaMSTR), an openEHR-based data repository. Automated creation of i2b2 ontologies from archetypes and templates and the integration of openEHR data instances from 903 patients of a paediatric intensive care unit has been achieved. In total, it took an average of ∼2527s to create 2.311.624 facts from 141.917 XML documents. Using the imported data, we conducted sample queries to compare the performance with two openEHR systems and to investigate if this representation of data is feasible to support cohort identification and record level data extraction. We found the automated population of an i2b2 clinical data warehouse to be a feasible approach to make openEHR data instances available for secondary use. Such an approach can facilitate timely provision of clinical data to researchers. It complements analytics based on the Archetype Query Language by allowing querying on both, legacy clinical data sources and openEHR data instances at the same time and by providing an easy-to-use query interface. However, due to different levels of expressiveness in the data models, not all semantics could be preserved during the ETL process. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  19. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  20. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    PubMed Central

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  1. A review of genomic data warehousing systems.

    PubMed

    Triplet, Thomas; Butler, Gregory

    2014-07-01

    To facilitate the integration and querying of genomics data, a number of generic data warehousing frameworks have been developed. They differ in their design and capabilities, as well as their intended audience. We provide a comprehensive and quantitative review of those genomic data warehousing frameworks in the context of large-scale systems biology. We reviewed in detail four genomic data warehouses (BioMart, BioXRT, InterMine and PathwayTools) freely available to the academic community. We quantified 20 aspects of the warehouses, covering the accuracy of their responses, their computational requirements and development efforts. Performance of the warehouses was evaluated under various hardware configurations to help laboratories optimize hardware expenses. Each aspect of the benchmark may be dynamically weighted by scientists using our online tool BenchDW (http://warehousebenchmark.fungalgenomics.ca/benchmark/) to build custom warehouse profiles and tailor our results to their specific needs.

  2. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    ERIC Educational Resources Information Center

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  3. Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system

    PubMed Central

    Murphy, SN; Barnett, GO; Chueh, HC

    2000-01-01

    The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL. PMID:11080028

  4. Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system

    PubMed

    Murphy; Barnett; Chueh

    2000-01-01

    The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL.

  5. Easily configured real-time CPOE Pick Off Tool supporting focused clinical research and quality improvement.

    PubMed

    Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A

    2014-01-01

    Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.

  6. An adaptable architecture for patient cohort identification from diverse data sources.

    PubMed

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-12-01

    We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.

  7. Use of a data warehouse at an academic medical center for clinical pathology quality improvement, education, and research

    PubMed Central

    Krasowski, Matthew D.; Schriever, Andy; Mathur, Gagan; Blau, John L.; Stauffer, Stephanie L.; Ford, Bradley A.

    2015-01-01

    Background: Pathology data contained within the electronic health record (EHR), and laboratory information system (LIS) of hospitals represents a potentially powerful resource to improve clinical care. However, existing reporting tools within commercial EHR and LIS software may not be able to efficiently and rapidly mine data for quality improvement and research applications. Materials and Methods: We present experience using a data warehouse produced collaboratively between an academic medical center and a private company. The data warehouse contains data from the EHR, LIS, admission/discharge/transfer system, and billing records and can be accessed using a self-service data access tool known as Starmaker. The Starmaker software allows users to use complex Boolean logic, include and exclude rules, unit conversion and reference scaling, and value aggregation using a straightforward visual interface. More complex queries can be achieved by users with experience with Structured Query Language. Queries can use biomedical ontologies such as Logical Observation Identifiers Names and Codes and Systematized Nomenclature of Medicine. Result: We present examples of successful searches using Starmaker, falling mostly in the realm of microbiology and clinical chemistry/toxicology. The searches were ones that were either very difficult or basically infeasible using reporting tools within the EHR and LIS used in the medical center. One of the main strengths of Starmaker searches is rapid results, with typical searches covering 5 years taking only 1–2 min. A “Run Count” feature quickly outputs the number of cases meeting criteria, allowing for refinement of searches before downloading patient-identifiable data. The Starmaker tool is available to pathology residents and fellows, with some using this tool for quality improvement and scholarly projects. Conclusion: A data warehouse has significant potential for improving utilization of clinical pathology testing. Software that can access data warehouse using a straightforward visual interface can be incorporated into pathology training programs. PMID:26284156

  8. Use of a data warehouse at an academic medical center for clinical pathology quality improvement, education, and research.

    PubMed

    Krasowski, Matthew D; Schriever, Andy; Mathur, Gagan; Blau, John L; Stauffer, Stephanie L; Ford, Bradley A

    2015-01-01

    Pathology data contained within the electronic health record (EHR), and laboratory information system (LIS) of hospitals represents a potentially powerful resource to improve clinical care. However, existing reporting tools within commercial EHR and LIS software may not be able to efficiently and rapidly mine data for quality improvement and research applications. We present experience using a data warehouse produced collaboratively between an academic medical center and a private company. The data warehouse contains data from the EHR, LIS, admission/discharge/transfer system, and billing records and can be accessed using a self-service data access tool known as Starmaker. The Starmaker software allows users to use complex Boolean logic, include and exclude rules, unit conversion and reference scaling, and value aggregation using a straightforward visual interface. More complex queries can be achieved by users with experience with Structured Query Language. Queries can use biomedical ontologies such as Logical Observation Identifiers Names and Codes and Systematized Nomenclature of Medicine. We present examples of successful searches using Starmaker, falling mostly in the realm of microbiology and clinical chemistry/toxicology. The searches were ones that were either very difficult or basically infeasible using reporting tools within the EHR and LIS used in the medical center. One of the main strengths of Starmaker searches is rapid results, with typical searches covering 5 years taking only 1-2 min. A "Run Count" feature quickly outputs the number of cases meeting criteria, allowing for refinement of searches before downloading patient-identifiable data. The Starmaker tool is available to pathology residents and fellows, with some using this tool for quality improvement and scholarly projects. A data warehouse has significant potential for improving utilization of clinical pathology testing. Software that can access data warehouse using a straightforward visual interface can be incorporated into pathology training programs.

  9. An adaptable architecture for patient cohort identification from diverse data sources

    PubMed Central

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-01-01

    Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442

  10. A Clinical Data Warehouse Based on OMOP and i2b2 for Austrian Health Claims Data.

    PubMed

    Rinner, Christoph; Gezgin, Deniz; Wendl, Christopher; Gall, Walter

    2018-01-01

    To develop simulation models for healthcare related questions clinical data can be reused. Develop a clinical data warehouse to harmonize different data sources in a standardized manner and get a reproducible interface for clinical data reuse. The Kimball life cycle for the development of data warehouse was used. The development is split into the technical, the data and the business intelligence pathway. Sample data was persisted in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The i2b2 clinical data warehouse tools were used to query the OMOP CDM by applying the new i2b2 multi-fact table feature. A clinical data warehouse was set up and sample data, data dimensions and ontologies for Austrian health claims data were created. The ability of the standardized data access layer to create and apply simulation models will be evaluated next.

  11. Extending the Query Language of a Data Warehouse for Patient Recruitment.

    PubMed

    Dietrich, Georg; Ertl, Maximilian; Fette, Georg; Kaspar, Mathias; Krebs, Jonathan; Mackenrodt, Daniel; Störk, Stefan; Puppe, Frank

    2017-01-01

    Patient recruitment for clinical trials is a laborious task, as many texts have to be screened. Usually, this work is done manually and takes a lot of time. We have developed a system that automates the screening process. Besides standard keyword queries, the query language supports extraction of numbers, time-spans and negations. In a feasibility study for patient recruitment from a stroke unit with 40 patients, we achieved encouraging extraction rates above 95% for numbers and negations and ca. 86% for time spans.

  12. Using Data Warehouses to extract knowledge from Agro-Hydrological simulations

    NASA Astrophysics Data System (ADS)

    Bouadi, Tassadit; Gascuel-Odoux, Chantal; Cordier, Marie-Odile; Quiniou, René; Moreau, Pierre

    2013-04-01

    In recent years, simulation models have been used more and more in hydrology to test the effect of scenarios and help stakeholders in decision making. Agro-hydrological models have oriented agricultural water management, by testing the effect of landscape structure and farming system changes on water and chemical emission in rivers. Such models generate a large amount of data while few of them, such as daily concentrations at the outlet of the catchment, or annual budgets regarding soil, water and atmosphere emissions, are stored and analyzed. Thus, a great amount of information is lost from the simulation process. This is due to the large volumes of simulated data, but also to the difficulties in analyzing and transforming the data in an usable information. In this talk we illustrate a data warehouse which has been built to store and manage simulation data coming from the agro-hydrological model TNT (Topography-based nitrogen transfer and transformations, (Beaujouan et al., 2002)). This model simulates the transfer and transformation of nitrogen in agricultural catchments. TNT was used over 10 years on the Yar catchment (western France), a 50 km2 square area which present a detailed data set and have to facing to environmental issue (coastal eutrophication). 44 output key simulated variables are stored at a daily time step, i.e, 8 GB of storage size, which allows the users to explore the N emission in space and time, to quantify all the processes of transfer and transformation regarding the cropping systems, their location within the catchment, the emission in water and atmosphere, and finally to get new knowledge and help in making specific and detailed decision in space and time. We present the dimensional modeling process of the Nitrogen in catchment data warehouse (i.e. the snowflake model). After identifying the set of multileveled dimensions with complex hierarchical structures and relationships among related dimension levels, we chose the snowflake model to design our agri-environmental data warehouse. The snowflake schema is required for flexible querying complex dimension relationships. We have designed the Nitrogen in catchment data warehouse using the open source Business Intelligence Platform Pentaho Version 3.5. We use the online analytical processing (OLAP) to access and exploit, intuitively and quickly, the multidimensional and aggregated data from the Nitrogen in catchment data warehouse. We illustrate how the data warehouse can be efficiently used to explore spatio-temporal dimensions and to discover new knowledge and enrich the exploitation level of simulations. We show how the OLAP tool can be used to provide the user with the ability to synthesize environmental information and to understand nitrates emission in surface water by using comparative, personalized views on historical data. To perform advanced analyses that aim to find meaningful patterns and relationships in the data, the Nitrogen in catchment data warehouse should be extended with data mining or information retrieval methods as Skyline queries (Bouadi et al., 2012). (Beaujouan et al., 2002) Beaujouan, V., Durand, P., Ruiz, L., Aurousseau, P., and Cotteret, G. (2002). A hydrological model dedicated to topography-based simulation of nitrogen transfer and transformation: rationale and application to the geomorphology denitrification relationship. Hydrological Processes, pages 493-507. (Bouadi et al., 2012) Bouadi, T., Cordier, M., and Quiniou, R. (2012). Incremental computation of skyline queries with dynamic preferences. In DEXA (1), pages 219-233.

  13. Web-based multimedia information retrieval for clinical application research

    NASA Astrophysics Data System (ADS)

    Cao, Xinhua; Hoo, Kent S., Jr.; Zhang, Hong; Ching, Wan; Zhang, Ming; Wong, Stephen T. C.

    2001-08-01

    We described a web-based data warehousing method for retrieving and analyzing neurological multimedia information. The web-based method supports convenient access, effective search and retrieval of clinical textual and image data, and on-line analysis. To improve the flexibility and efficiency of multimedia information query and analysis, a three-tier, multimedia data warehouse for epilepsy research has been built. The data warehouse integrates clinical multimedia data related to epilepsy from disparate sources and archives them into a well-defined data model.

  14. Visual analytics techniques for large multi-attribute time series data

    NASA Astrophysics Data System (ADS)

    Hao, Ming C.; Dayal, Umeshwar; Keim, Daniel A.

    2008-01-01

    Time series data commonly occur when variables are monitored over time. Many real-world applications involve the comparison of long time series across multiple variables (multi-attributes). Often business people want to compare this year's monthly sales with last year's sales to make decisions. Data warehouse administrators (DBAs) want to know their daily data loading job performance. DBAs need to detect the outliers early enough to act upon them. In this paper, two new visual analytic techniques are introduced: The color cell-based Visual Time Series Line Charts and Maps highlight significant changes over time in a long time series data and the new Visual Content Query facilitates finding the contents and histories of interesting patterns and anomalies, which leads to root cause identification. We have applied both methods to two real-world applications to mine enterprise data warehouse and customer credit card fraud data to illustrate the wide applicability and usefulness of these techniques.

  15. Biomedical informatics: development of a comprehensive data warehouse for clinical and genomic breast cancer research.

    PubMed

    Hu, Hai; Brzeski, Henry; Hutchins, Joe; Ramaraj, Mohan; Qu, Long; Xiong, Richard; Kalathil, Surendran; Kato, Rand; Tenkillaya, Santhosh; Carney, Jerry; Redd, Rosann; Arkalgudvenkata, Sheshkumar; Shahzad, Kashif; Scott, Richard; Cheng, Hui; Meadow, Stephen; McMichael, John; Sheu, Shwu-Lin; Rosendale, David; Kvecher, Leonid; Ahern, Stephen; Yang, Song; Zhang, Yonghong; Jordan, Rick; Somiari, Stella B; Hooke, Jeffrey; Shriver, Craig D; Somiari, Richard I; Liebman, Michael N

    2004-10-01

    The Windber Research Institute is an integrated high-throughput research center employing clinical, genomic and proteomic platforms to produce terabyte levels of data. We use biomedical informatics technologies to integrate all of these operations. This report includes information on a multi-year, multi-phase hybrid data warehouse project currently under development in the Institute. The purpose of the warehouse is to host the terabyte-level of internal experimentally generated data as well as data from public sources. We have previously reported on the phase I development, which integrated limited internal data sources and selected public databases. Currently, we are completing phase II development, which integrates our internal automated data sources and develops visualization tools to query across these data types. This paper summarizes our clinical and experimental operations, the data warehouse development, and the challenges we have faced. In phase III we plan to federate additional manual internal and public data sources and then to develop and adapt more data analysis and mining tools. We expect that the final implementation of the data warehouse will greatly facilitate biomedical informatics research.

  16. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    PubMed

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.

  17. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework

    PubMed Central

    Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145

  18. Real-time high-level video understanding using data warehouse

    NASA Astrophysics Data System (ADS)

    Lienard, Bruno; Desurmont, Xavier; Barrie, Bertrand; Delaigle, Jean-Francois

    2006-02-01

    High-level Video content analysis such as video-surveillance is often limited by computational aspects of automatic image understanding, i.e. it requires huge computing resources for reasoning processes like categorization and huge amount of data to represent knowledge of objects, scenarios and other models. This article explains how to design and develop a "near real-time adaptive image datamart", used, as a decisional support system for vision algorithms, and then as a mass storage system. Using RDF specification as storing format of vision algorithms meta-data, we can optimise the data warehouse concepts for video analysis, add some processes able to adapt the current model and pre-process data to speed-up queries. In this way, when new data is sent from a sensor to the data warehouse for long term storage, using remote procedure call embedded in object-oriented interfaces to simplified queries, they are processed and in memory data-model is updated. After some processing, possible interpretations of this data can be returned back to the sensor. To demonstrate this new approach, we will present typical scenarios applied to this architecture such as people tracking and events detection in a multi-camera network. Finally we will show how this system becomes a high-semantic data container for external data-mining.

  19. MitoMiner: a data warehouse for mitochondrial proteomics data

    PubMed Central

    Smith, Anthony C.; Blackshaw, James A.; Robinson, Alan J.

    2012-01-01

    MitoMiner (http://mitominer.mrc-mbu.cam.ac.uk/) is a data warehouse for the storage and analysis of mitochondrial proteomics data gathered from publications of mass spectrometry and green fluorescent protein tagging studies. In MitoMiner, these data are integrated with data from UniProt, Gene Ontology, Online Mendelian Inheritance in Man, HomoloGene, Kyoto Encyclopaedia of Genes and Genomes and PubMed. The latest release of MitoMiner stores proteomics data sets from 46 studies covering 11 different species from eumetazoa, viridiplantae, fungi and protista. MitoMiner is implemented by using the open source InterMine data warehouse system, which provides a user interface allowing users to upload data for analysis, personal accounts to store queries and results and enables queries of any data in the data model. MitoMiner also provides lists of proteins for use in analyses, including the new MitoMiner mitochondrial proteome reference sets that specify proteins with substantial experimental evidence for mitochondrial localization. As further mitochondrial proteomics data sets from normal and diseased tissue are published, MitoMiner can be used to characterize the variability of the mitochondrial proteome between tissues and investigate how changes in the proteome may contribute to mitochondrial dysfunction and mitochondrial-associated diseases such as cancer, neurodegenerative diseases, obesity, diabetes, heart failure and the ageing process. PMID:22121219

  20. Merging OLTP and OLAP - Back to the Future

    NASA Astrophysics Data System (ADS)

    Lehner, Wolfgang

    When the terms "Data Warehousing" and "Online Analytical Processing" were coined in the 1990s by Kimball, Codd, and others, there was an obvious need for separating data and workload for operational transactional-style processing and decision-making implying complex analytical queries over large and historic data sets. Large data warehouse infrastructures have been set up to cope with the special requirements of analytical query answering for multiple reasons: For example, analytical thinking heavily relies on predefined navigation paths to guide the user through the data set and to provide different views on different aggregation levels.Multi-dimensional queries exploiting hierarchically structured dimensions lead to complex star queries at a relational backend, which could hardly be handled by classical relational systems.

  1. Data warehousing with Oracle

    NASA Astrophysics Data System (ADS)

    Shahzad, Muhammad A.

    1999-02-01

    With the emergence of data warehousing, Decision support systems have evolved to its best. At the core of these warehousing systems lies a good database management system. Database server, used for data warehousing, is responsible for providing robust data management, scalability, high performance query processing and integration with other servers. Oracle being the initiator in warehousing servers, provides a wide range of features for facilitating data warehousing. This paper is designed to review the features of data warehousing - conceptualizing the concept of data warehousing and, lastly, features of Oracle servers for implementing a data warehouse.

  2. Querying and Computing with BioCyc Databases

    PubMed Central

    Krummenacker, Markus; Paley, Suzanne; Mueller, Lukas; Yan, Thomas; Karp, Peter D.

    2006-01-01

    Summary We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database. Availability The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml PMID:15961440

  3. toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research

    PubMed Central

    Rhee, David B.; Croken, Matthew McKnight; Shieh, Kevin R.; Sullivan, Julie; Micklem, Gos; Kim, Kami; Golden, Aaron

    2015-01-01

    Toxoplasma gondii (T. gondii) is an obligate intracellular parasite that must monitor for changes in the host environment and respond accordingly; however, it is still not fully known which genetic or epigenetic factors are involved in regulating virulence traits of T. gondii. There are on-going efforts to elucidate the mechanisms regulating the stage transition process via the application of high-throughput epigenomics, genomics and proteomics techniques. Given the range of experimental conditions and the typical yield from such high-throughput techniques, a new challenge arises: how to effectively collect, organize and disseminate the generated data for subsequent data analysis. Here, we describe toxoMine, which provides a powerful interface to support sophisticated integrative exploration of high-throughput experimental data and metadata, providing researchers with a more tractable means toward understanding how genetic and/or epigenetic factors play a coordinated role in determining pathogenicity of T. gondii. As a data warehouse, toxoMine allows integration of high-throughput data sets with public T. gondii data. toxoMine is also able to execute complex queries involving multiple data sets with straightforward user interaction. Furthermore, toxoMine allows users to define their own parameters during the search process that gives users near-limitless search and query capabilities. The interoperability feature also allows users to query and examine data available in other InterMine systems, which would effectively augment the search scope beyond what is available to toxoMine. toxoMine complements the major community database ToxoDB by providing a data warehouse that enables more extensive integrative studies for T. gondii. Given all these factors, we believe it will become an indispensable resource to the greater infectious disease research community. Database URL: http://toxomine.org PMID:26130662

  4. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface.

    PubMed

    Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B; Almon, Richard R; DuBois, Debra C; Jusko, William J; Hoffman, Eric P

    2004-01-01

    Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).

  5. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface

    PubMed Central

    Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B.; Almon, Richard R.; DuBois, Debra C.; Jusko, William J.; Hoffman, Eric P.

    2004-01-01

    Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp). PMID:14681485

  6. Advanced biological and chemical discovery (ABCD): centralizing discovery knowledge in an inherently decentralized world.

    PubMed

    Agrafiotis, Dimitris K; Alex, Simson; Dai, Heng; Derkinderen, An; Farnum, Michael; Gates, Peter; Izrailev, Sergei; Jaeger, Edward P; Konstant, Paul; Leung, Albert; Lobanov, Victor S; Marichal, Patrick; Martin, Douglas; Rassokhin, Dmitrii N; Shemanarev, Maxim; Skalkin, Andrew; Stong, John; Tabruyn, Tom; Vermeiren, Marleen; Wan, Jackson; Xu, Xiang Yang; Yao, Xiang

    2007-01-01

    We present ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. ABCD is an attempt to bridge multiple continents, data systems, and cultures using modern information technology and to provide scientists with tools that allow them to analyze multifactorial SAR and make informed, data-driven decisions. The system consists of three major components: (1) a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, designed for supreme query performance; (2) a state-of-the-art application suite, which facilitates data upload, retrieval, mining, and reporting, and (3) a workspace, which facilitates collaboration and data sharing by allowing users to share queries, templates, results, and reports across project teams, campuses, and other organizational units. Chemical intelligence, performance, and analytical sophistication lie at the heart of the new system, which was developed entirely in-house. ABCD is used routinely by more than 1000 scientists around the world and is rapidly expanding into other functional areas within the J&J organization.

  7. Data warehouse implementation with clinical pharmacokinetic/pharmacodynamic data.

    PubMed

    Koprowski, S P; Barrett, J S

    2002-03-01

    We have created a data warehouse for human pharmacokinetic (PK) and pharmacodynamic (PD) data generated primarily within the Clinical PK Group of the Drug Metabolism and Pharmacokinetics (DM&PK) Department of DuPont Pharmaceuticals. Data which enters an Oracle-based LIMS directly from chromatography systems or through files from contract research organizations are accessed via SAS/PH.Kinetics, GLP-compliant data analysis software residing on individual users' workstations. Upon completion of the final PK or PD analysis, data are pushed to a predefined location. Data analyzed/created with other software (i.e., WinNonlin, NONMEM, Adapt, etc.) are added to this file repository as well. The warehouse creates views to these data and accumulates metadata on all data sources defined in the warehouse. The warehouse is managed via the SAS/Warehouse Administrator product that defines the environment, creates summarized data structures, and schedules data refresh. The clinical PK/PD warehouse encompasses laboratory, biometric, PK and PD data streams. Detailed logical tables for each compound are created/updated as the clinical PK/PD data warehouse is populated. The data model defined to the warehouse is based on a star schema. Summarized data structures such as multidimensional data bases (MDDB), infomarts, and datamarts are created from detail tables. Data mining and querying of highly summarized data as well as drill-down to detail data is possible via the creation of exploitation tools which front-end the warehouse data. Based on periodic refreshing of the warehouse data, these applications are able to access the most current data available and do not require a manual interface to update/populate the data store. Prototype applications have been web-enabled to facilitate their usage to varied data customers across platform and location. The warehouse also contains automated mechanisms for the construction of study data listings and SAS transport files for eventual incorporation into an electronic submission. This environment permits the management of online analytical processing via a single administrator once the data model and warehouse configuration have been designed. The expansion of the current environment will eventually connect data from all phases of research and development ensuring the return on investment and hopefully efficiencies in data processing unforeseen with earlier legacy systems.

  8. Developing Access Control Model of Web OLAP over Trusted and Collaborative Data Warehouses

    NASA Astrophysics Data System (ADS)

    Fugkeaw, Somchart; Mitrpanont, Jarernsri L.; Manpanpanich, Piyawit; Juntapremjitt, Sekpon

    This paper proposes the design and development of Role- based Access Control (RBAC) model for the Single Sign-On (SSO) Web-OLAP query spanning over multiple data warehouses (DWs). The model is based on PKI Authentication and Privilege Management Infrastructure (PMI); it presents a binding model of RBAC authorization based on dimension privilege specified in attribute certificate (AC) and user identification. Particularly, the way of attribute mapping between DW user authentication and privilege of dimensional access is illustrated. In our approach, we apply the multi-agent system to automate flexible and effective management of user authentication, role delegation as well as system accountability. Finally, the paper culminates in the prototype system A-COLD (Access Control of web-OLAP over multiple DWs) that incorporates the OLAP features and authentication and authorization enforcement in the multi-user and multi-data warehouse environment.

  9. Designing a Clinical Data Warehouse Architecture to Support Quality Improvement Initiatives.

    PubMed

    Chelico, John D; Wilcox, Adam B; Vawdrey, David K; Kuperman, Gilad J

    2016-01-01

    Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement.

  10. Designing a Clinical Data Warehouse Architecture to Support Quality Improvement Initiatives

    PubMed Central

    Chelico, John D.; Wilcox, Adam B.; Vawdrey, David K.; Kuperman, Gilad J.

    2016-01-01

    Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement. PMID:28269833

  11. An online analytical processing multi-dimensional data warehouse for malaria data

    PubMed Central

    Madey, Gregory R; Vyushkov, Alexander; Raybaud, Benoit; Burkot, Thomas R; Collins, Frank H

    2017-01-01

    Abstract Malaria is a vector-borne disease that contributes substantially to the global burden of morbidity and mortality. The management of malaria-related data from heterogeneous, autonomous, and distributed data sources poses unique challenges and requirements. Although online data storage systems exist that address specific malaria-related issues, a globally integrated online resource to address different aspects of the disease does not exist. In this article, we describe the design, implementation, and applications of a multi-dimensional, online analytical processing data warehouse, named the VecNet Data Warehouse (VecNet-DW). It is the first online, globally-integrated platform that provides efficient search, retrieval and visualization of historical, predictive, and static malaria-related data, organized in data marts. Historical and static data are modelled using star schemas, while predictive data are modelled using a snowflake schema. The major goals, characteristics, and components of the DW are described along with its data taxonomy and ontology, the external data storage systems and the logical modelling and physical design phases. Results are presented as screenshots of a Dimensional Data browser, a Lookup Tables browser, and a Results Viewer interface. The power of the DW emerges from integrated querying of the different data marts and structuring those queries to the desired dimensions, enabling users to search, view, analyse, and store large volumes of aggregated data, and responding better to the increasing demands of users. Database URL https://dw.vecnet.org/datawarehouse/ PMID:29220463

  12. A Dimensional Bus model for integrating clinical and research data.

    PubMed

    Wade, Ted D; Hum, Richard C; Murphy, James R

    2011-12-01

    Many clinical research data integration platforms rely on the Entity-Attribute-Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Borrowing concepts from Entity-Attribute-Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. The design provides a workable way to manage and query mixed schemas in a data warehouse.

  13. A database de-identification framework to enable direct queries on medical data for secondary use.

    PubMed

    Erdal, B S; Liu, J; Ding, J; Chen, J; Marsh, C B; Kamal, J; Clymer, B D

    2012-01-01

    To qualify the use of patient clinical records as non-human-subject for research purpose, electronic medical record data must be de-identified so there is minimum risk to protected health information exposure. This study demonstrated a robust framework for structured data de-identification that can be applied to any relational data source that needs to be de-identified. Using a real world clinical data warehouse, a pilot implementation of limited subject areas were used to demonstrate and evaluate this new de-identification process. Query results and performances are compared between source and target system to validate data accuracy and usability. The combination of hashing, pseudonyms, and session dependent randomizer provides a rigorous de-identification framework to guard against 1) source identifier exposure; 2) internal data analyst manually linking to source identifiers; and 3) identifier cross-link among different researchers or multiple query sessions by the same researcher. In addition, a query rejection option is provided to refuse queries resulting in less than preset numbers of subjects and total records to prevent users from accidental subject identification due to low volume of data. This framework does not prevent subject re-identification based on prior knowledge and sequence of events. Also, it does not deal with medical free text de-identification, although text de-identification using natural language processing can be included due its modular design. We demonstrated a framework resulting in HIPAA Compliant databases that can be directly queried by researchers. This technique can be augmented to facilitate inter-institutional research data sharing through existing middleware such as caGrid.

  14. A geodata warehouse: Using denormalisation techniques as a tool for delivering spatially enabled integrated geological information to geologists

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2016-11-01

    New requirements to understand geological properties in three dimensions have led to the development of PropBase, a data structure and delivery tools to deliver this. At the BGS, relational database management systems (RDBMS) has facilitated effective data management using normalised subject-based database designs with business rules in a centralised, vocabulary controlled, architecture. These have delivered effective data storage in a secure environment. However, isolated subject-oriented designs prevented efficient cross-domain querying of datasets. Additionally, the tools provided often did not enable effective data discovery as they struggled to resolve the complex underlying normalised structures providing poor data access speeds. Users developed bespoke access tools to structures they did not fully understand sometimes delivering them incorrect results. Therefore, BGS has developed PropBase, a generic denormalised data structure within an RDBMS to store property data, to facilitate rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, with associated metadata. This includes scripts to populate and synchronise the layer with its data sources through structured input and transcription standards. A core component of the architecture includes, an optimised query object, to deliver geoscience information from a structure equivalent to a data warehouse. This enables optimised query performance to deliver data in multiple standardised formats using a web discovery tool. Semantic interoperability is enforced through vocabularies combined from all data sources facilitating searching of related terms. PropBase holds 28.1 million spatially enabled property data points from 10 source databases incorporating over 50 property data types with a vocabulary set that includes 557 property terms. By enabling property data searches across multiple databases PropBase has facilitated new scientific research, previously considered impractical. PropBase is easily extended to incorporate 4D data (time series) and is providing a baseline for new "big data" monitoring projects.

  15. A Dimensional Bus model for integrating clinical and research data

    PubMed Central

    Hum, Richard C; Murphy, James R

    2011-01-01

    Objectives Many clinical research data integration platforms rely on the Entity–Attribute–Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Materials and Methods Borrowing concepts from Entity–Attribute–Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. Results The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. Conclusion The design provides a workable way to manage and query mixed schemas in a data warehouse. PMID:21856687

  16. Ad Hoc Information Extraction for Clinical Data Warehouses.

    PubMed

    Dietrich, Georg; Krebs, Jonathan; Fette, Georg; Ertl, Maximilian; Kaspar, Mathias; Störk, Stefan; Puppe, Frank

    2018-05-01

    Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with "heart failure" including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [1] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [2] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts. Schattauer GmbH.

  17. Federated querying architecture with clinical & translational health IT application.

    PubMed

    Livne, Oren E; Schultz, N Dustin; Narus, Scott P

    2011-10-01

    We present a software architecture that federates data from multiple heterogeneous health informatics data sources owned by multiple organizations. The architecture builds upon state-of-the-art open-source Java and XML frameworks in innovative ways. It consists of (a) federated query engine, which manages federated queries and result set aggregation via a patient identification service; and (b) data source facades, which translate the physical data models into a common model on-the-fly and handle large result set streaming. System modules are connected via reusable Apache Camel integration routes and deployed to an OSGi enterprise service bus. We present an application of our architecture that allows users to construct queries via the i2b2 web front-end, and federates patient data from the University of Utah Enterprise Data Warehouse and the Utah Population database. Our system can be easily adopted, extended and integrated with existing SOA Healthcare and HL7 frameworks such as i2b2 and caGrid.

  18. Comparison of traditional trigger tool to data warehouse based screening for identifying hospital adverse events.

    PubMed

    O'Leary, Kevin J; Devisetty, Vikram K; Patel, Amitkumar R; Malkenson, David; Sama, Pradeep; Thompson, William K; Landler, Matthew P; Barnard, Cynthia; Williams, Mark V

    2013-02-01

    Research supports medical record review using screening triggers as the optimal method to detect hospital adverse events (AE), yet the method is labour-intensive. This study compared a traditional trigger tool with an enterprise data warehouse (EDW) based screening method to detect AEs. We created 51 automated queries based on 33 traditional triggers from prior research, and then applied them to 250 randomly selected medical patients hospitalised between 1 September 2009 and 31 August 2010. Two physicians each abstracted records from half the patients using a traditional trigger tool and then performed targeted abstractions for patients with positive EDW queries in the complementary half of the sample. A third physician confirmed presence of AEs and assessed preventability and severity. Traditional trigger tool and EDW based screening identified 54 (22%) and 53 (21%) patients with one or more AE. Overall, 140 (56%) patients had one or more positive EDW screens (total 366 positive screens). Of the 137 AEs detected by at least one method, 86 (63%) were detected by a traditional trigger tool, 97 (71%) by EDW based screening and 46 (34%) by both methods. Of the 11 total preventable AEs, 6 (55%) were detected by traditional trigger tool, 7 (64%) by EDW based screening and 2 (18%) by both methods. Of the 43 total serious AEs, 28 (65%) were detected by traditional trigger tool, 29 (67%) by EDW based screening and 14 (33%) by both. We found relatively poor agreement between traditional trigger tool and EDW based screening with only approximately a third of all AEs detected by both methods. A combination of complementary methods is the optimal approach to detecting AEs among hospitalised patients.

  19. Electronic medical record: research tool for pancreatic cancer?

    PubMed

    Arous, Edward J; McDade, Theodore P; Smith, Jillian K; Ng, Sing Chau; Sullivan, Mary E; Zottola, Ralph J; Ranauro, Paul J; Shah, Shimul A; Whalen, Giles F; Tseng, Jennifer F

    2014-04-01

    A novel data warehouse based on automated retrieval from an institutional health care information system (HIS) was made available to be compared with a traditional prospectively maintained surgical database. A newly established institutional data warehouse at a single-institution academic medical center autopopulated by HIS was queried for International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) diagnosis codes for pancreatic neoplasm. Patients with ICD-9-CM diagnosis codes for pancreatic neoplasm were captured. A parallel query was performed using a prospective database populated by manual entry. Duplicated patients and those unique to either data set were identified. All patients were manually reviewed to determine the accuracy of diagnosis. A total of 1107 patients were identified from the HIS-linked data set with pancreatic neoplasm from 1999-2009. Of these, 254 (22.9%) patients were also captured by the surgical database, whereas 853 (77.1%) patients were only in the HIS-linked data set. Manual review of the HIS-only group demonstrated that 45.0% of patients were without identifiable pancreatic pathology, suggesting erroneous capture, whereas 36.3% of patients were consistent with pancreatic neoplasm and 18.7% with other pancreatic pathology. Of the 394 patients identified by the surgical database, 254 (64.5%) patients were captured by HIS, whereas 140 (35.5%) patients were not. Manual review of patients only captured by the surgical database demonstrated 85.9% with pancreatic neoplasm and 14.1% with other pancreatic pathology. Finally, review of the 254 patient overlap demonstrated that 80.3% of patients had pancreatic neoplasm and 19.7% had other pancreatic pathology. These results suggest that cautious interpretation of administrative data rely only on ICD-9-CM diagnosis codes and clinical correlation through previously validated mechanisms. Published by Elsevier Inc.

  20. A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications

    NASA Technical Reports Server (NTRS)

    Grossman, Robert L.; Northcutt, Dave

    1996-01-01

    Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.

  1. Unstructured medical image query using big data - An epilepsy case study.

    PubMed

    Istephan, Sarmad; Siadat, Mohammad-Reza

    2016-02-01

    Big data technologies are critical to the medical field which requires new frameworks to leverage them. Such frameworks would benefit medical experts to test hypotheses by querying huge volumes of unstructured medical data to provide better patient care. The objective of this work is to implement and examine the feasibility of having such a framework to provide efficient querying of unstructured data in unlimited ways. The feasibility study was conducted specifically in the epilepsy field. The proposed framework evaluates a query in two phases. In phase 1, structured data is used to filter the clinical data warehouse. In phase 2, feature extraction modules are executed on the unstructured data in a distributed manner via Hadoop to complete the query. Three modules have been created, volume comparer, surface to volume conversion and average intensity. The framework allows for user-defined modules to be imported to provide unlimited ways to process the unstructured data hence potentially extending the application of this framework beyond epilepsy field. Two types of criteria were used to validate the feasibility of the proposed framework - the ability/accuracy of fulfilling an advanced medical query and the efficiency that Hadoop provides. For the first criterion, the framework executed an advanced medical query that spanned both structured and unstructured data with accurate results. For the second criterion, different architectures were explored to evaluate the performance of various Hadoop configurations and were compared to a traditional Single Server Architecture (SSA). The surface to volume conversion module performed up to 40 times faster than the SSA (using a 20 node Hadoop cluster) and the average intensity module performed up to 85 times faster than the SSA (using a 40 node Hadoop cluster). Furthermore, the 40 node Hadoop cluster executed the average intensity module on 10,000 models in 3h which was not even practical for the SSA. The current study is limited to epilepsy field and further research and more feature extraction modules are required to show its applicability in other medical domains. The proposed framework advances data-driven medicine by unleashing the content of unstructured medical data in an efficient and unlimited way to be harnessed by medical experts. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Assessment of Electronic Government Information Products

    DTIC Science & Technology

    1999-03-30

    Center for Environmental Info. & Statistics CM Consumer Handbook for Reducing Solid Waste CM Envirofacts Warehouse CM EPA Online Library System (OLS) CP...Hazardous Waste Site Query (CERCLIS Data) CM Surf Your Watershed CM Test Methods for Evaluating Solid Waste : Physical/Chemical Methods (SW-846) CM...SGML because they consider it "intelligent data" that can automatically generate other formats (e.g., web, BBS, Fax on Demand) through templates and

  3. A Data Warehouse to Support Condition Based Maintenance (CBM)

    DTIC Science & Technology

    2005-05-01

    Application ( VBA ) code sequence to import the original MAST-generated CSV and then create a single output table in DBASE IV format. The DBASE IV format...database architecture (Oracle, Sybase, MS- SQL , etc). This design includes table definitions, comments, specification of table attributes, primary and foreign...built queries and applications. Needs the application developers to construct data views. No SQL programming experience. b. Power Database User - knows

  4. Hierarchical content-based image retrieval by dynamic indexing and guided search

    NASA Astrophysics Data System (ADS)

    You, Jane; Cheung, King H.; Liu, James; Guo, Linong

    2003-12-01

    This paper presents a new approach to content-based image retrieval by using dynamic indexing and guided search in a hierarchical structure, and extending data mining and data warehousing techniques. The proposed algorithms include: a wavelet-based scheme for multiple image feature extraction, the extension of a conventional data warehouse and an image database to an image data warehouse for dynamic image indexing, an image data schema for hierarchical image representation and dynamic image indexing, a statistically based feature selection scheme to achieve flexible similarity measures, and a feature component code to facilitate query processing and guide the search for the best matching. A series of case studies are reported, which include a wavelet-based image color hierarchy, classification of satellite images, tropical cyclone pattern recognition, and personal identification using multi-level palmprint and face features.

  5. Characteristics Desired in Clinical Data Warehouse for Biomedical Research

    PubMed Central

    Shin, Soo-Yong; Kim, Woo Sung

    2014-01-01

    Objectives Due to the unique characteristics of clinical data, clinical data warehouses (CDWs) have not been successful so far. Specifically, the use of CDWs for biomedical research has been relatively unsuccessful thus far. The characteristics necessary for the successful implementation and operation of a CDW for biomedical research have not clearly defined yet. Methods Three examples of CDWs were reviewed: a multipurpose CDW in a hospital, a CDW for independent multi-institutional research, and a CDW for research use in an institution. After reviewing the three CDW examples, we propose some key characteristics needed in a CDW for biomedical research. Results A CDW for research should include an honest broker system and an Institutional Review Board approval interface to comply with governmental regulations. It should also include a simple query interface, an anonymized data review tool, and a data extraction tool. Also, it should be a biomedical research platform for data repository use as well as data analysis. Conclusions The proposed characteristics desired in a CDW may have limited transfer value to organizations in other countries. However, these analysis results are still valid in Korea, and we have developed clinical research data warehouse based on these desiderata. PMID:24872909

  6. Bio-image warehouse system: concept and implementation of a diagnosis-based data warehouse for advanced imaging modalities in neuroradiology.

    PubMed

    Minati, L; Ghielmetti, F; Ciobanu, V; D'Incerti, L; Maccagnano, C; Bizzi, A; Bruzzone, M G

    2007-03-01

    Advanced neuroimaging techniques, such as functional magnetic resonance imaging (fMRI), chemical shift spectroscopy imaging (CSI), diffusion tensor imaging (DTI), and perfusion-weighted imaging (PWI) create novel challenges in terms of data storage and management: huge amounts of raw data are generated, the results of analysis may depend on the software and settings that have been used, and most often intermediate files are inherently not compliant with the current DICOM (digital imaging and communication in medicine) standard, as they contain multidimensional complex and tensor arrays and various other types of data structures. A software architecture, referred to as Bio-Image Warehouse System (BIWS), which can be used alongside a radiology information system/picture archiving and communication system (RIS/PACS) system to store neuroimaging data for research purposes, is presented. The system architecture is conceived with the purpose of enabling to query by diagnosis according to a predefined two-layered classification taxonomy. The operational impact of the system and the time needed to get acquainted with the web-based interface and with the taxonomy are found to be limited. The development of modules enabling automated creation of statistical templates is proposed.

  7. An ICT infrastructure to integrate clinical and molecular data in oncology research

    PubMed Central

    2012-01-01

    Background The ONCO-i2b2 platform is a bioinformatics tool designed to integrate clinical and research data and support translational research in oncology. It is implemented by the University of Pavia and the IRCCS Fondazione Maugeri hospital (FSM), and grounded on the software developed by the Informatics for Integrating Biology and the Bedside (i2b2) research center. I2b2 has delivered an open source suite based on a data warehouse, which is efficiently interrogated to find sets of interesting patients through a query tool interface. Methods Onco-i2b2 integrates data coming from multiple sources and allows the users to jointly query them. I2b2 data are then stored in a data warehouse, where facts are hierarchically structured as ontologies. Onco-i2b2 gathers data from the FSM pathology unit (PU) database and from the hospital biobank and merges them with the clinical information from the hospital information system. Our main effort was to provide a robust integrated research environment, giving a particular emphasis to the integration process and facing different challenges, consecutively listed: biospecimen samples privacy and anonymization; synchronization of the biobank database with the i2b2 data warehouse through a series of Extract, Transform, Load (ETL) operations; development and integration of a Natural Language Processing (NLP) module, to retrieve coded information, such as SNOMED terms and malignant tumors (TNM) classifications, and clinical tests results from unstructured medical records. Furthermore, we have developed an internal SNOMED ontology rested on the NCBO BioPortal web services. Results Onco-i2b2 manages data of more than 6,500 patients with breast cancer diagnosis collected between 2001 and 2011 (over 390 of them have at least one biological sample in the cancer biobank), more than 47,000 visits and 96,000 observations over 960 medical concepts. Conclusions Onco-i2b2 is a concrete example of how integrated Information and Communication Technology architecture can be implemented to support translational research. The next steps of our project will involve the extension of its capabilities by implementing new plug-in devoted to bioinformatics data analysis as well as a temporal query module. PMID:22536972

  8. A study of data representation in Hadoop to optimize data storage and search performance for the ATLAS EventIndex

    NASA Astrophysics Data System (ADS)

    Baranowski, Z.; Canali, L.; Toebbicke, R.; Hrivnac, J.; Barberis, D.

    2017-10-01

    This paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions of event records, each of which consists of ∼100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. We report also on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports.

  9. CHOmine: an integrated data warehouse for CHO systems biology and modeling

    PubMed Central

    Hanscho, Michael; Ruckerbauer, David E.; Zanghellini, Jürgen; Borth, Nicole

    2017-01-01

    Abstract The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. Database URL: http://www.chogenome.org PMID:28605771

  10. FlyMine: an integrated database for Drosophila and Anopheles genomics

    PubMed Central

    Lyne, Rachel; Smith, Richard; Rutherford, Kim; Wakeling, Matthew; Varley, Andrew; Guillier, Francois; Janssens, Hilde; Ji, Wenyan; Mclaren, Peter; North, Philip; Rana, Debashis; Riley, Tom; Sullivan, Julie; Watkins, Xavier; Woodbridge, Mark; Lilley, Kathryn; Russell, Steve; Ashburner, Michael; Mizuguchi, Kenji; Micklem, Gos

    2007-01-01

    FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists. PMID:17615057

  11. Ligand Depot: a data warehouse for ligands bound to macromolecules.

    PubMed

    Feng, Zukang; Chen, Li; Maddula, Himabindu; Akcan, Ozgur; Oughtred, Rose; Berman, Helen M; Westbrook, John

    2004-09-01

    Ligand Depot is an integrated data resource for finding information about small molecules bound to proteins and nucleic acids. The initial release (version 1.0, November, 2003) focuses on providing chemical and structural information for small molecules found as part of the structures deposited in the Protein Data Bank. Ligand Depot accepts keyword-based queries and also provides a graphical interface for performing chemical substructure searches. A wide variety of web resources that contain information on small molecules may also be accessed through Ligand Depot. Ligand Depot is available at http://ligand-depot.rutgers.edu/. Version 1.0 supports multiple operating systems including Windows, Unix, Linux and the Macintosh operating system. The current drawing tool works in Internet Explorer, Netscape and Mozilla on Windows, Unix and Linux.

  12. CGDM: collaborative genomic data model for molecular profiling data using NoSQL.

    PubMed

    Wang, Shicai; Mares, Mihaela A; Guo, Yi-Ke

    2016-12-01

    High-throughput molecular profiling has greatly improved patient stratification and mechanistic understanding of diseases. With the increasing amount of data used in translational medicine studies in recent years, there is a need to improve the performance of data warehouses in terms of data retrieval and statistical processing. Both relational and Key Value models have been used for managing molecular profiling data. Key Value models such as SeqWare have been shown to be particularly advantageous in terms of query processing speed for large datasets. However, more improvement can be achieved, particularly through better indexing techniques of the Key Value models, taking advantage of the types of queries which are specific for the high-throughput molecular profiling data. In this article, we introduce a Collaborative Genomic Data Model (CGDM), aimed at significantly increasing the query processing speed for the main classes of queries on genomic databases. CGDM creates three Collaborative Global Clustering Index Tables (CGCITs) to solve the velocity and variety issues at the cost of limited extra volume. Several benchmarking experiments were carried out, comparing CGDM implemented on HBase to the traditional SQL data model (TDM) implemented on both HBase and MySQL Cluster, using large publicly available molecular profiling datasets taken from NCBI and HapMap. In the microarray case, CGDM on HBase performed up to 246 times faster than TDM on HBase and 7 times faster than TDM on MySQL Cluster. In single nucleotide polymorphism case, CGDM on HBase outperformed TDM on HBase by up to 351 times and TDM on MySQL Cluster by up to 9 times. The CGDM source code is available at https://github.com/evanswang/CGDM. y.guo@imperial.ac.uk. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. A robotic system for automation of logistics functions on the Space Station

    NASA Technical Reports Server (NTRS)

    Martin, J. C.; Purves, R. B.; Hosier, R. N.; Krein, B. A.

    1988-01-01

    Spacecraft inventory management is currently performed by the crew and as systems become more complex, increased crew time will be required to perform routine logistics activities. If future spacecraft are to function effectively as research labs and production facilities, the efficient use of crew time as a limited resource for performing mission functions must be employed. The use of automation and robotics technology, such as automated warehouse and materials handling functions, can free the crew from many logistics tasks and provide more efficient use of crew time. Design criteria for a Space Station Automated Logistics Inventory Management System is focused on through the design and demonstration of a mobile two armed terrestrial robot. The system functionally represents a 0 gravity automated inventory management system and the problems associated with operating in such an environment. Features of the system include automated storage and retrieval, item recognition, two armed robotic manipulation, and software control of all inventory item transitions and queries.

  14. Description of a Method to Obtain Complete One-Year Follow-Up in the Society of Thoracic Surgeons/American College of Cardiology Transcatheter Valve Therapy Registry.

    PubMed

    Wooley, Jordan; Neatherlin, Holly; Mahoney, Cecile; Squiers, John J; Tabachnick, Deborah; Suresh, Mitta; Huff, Eleanor; Basra, Sukhdeep S; DiMaio, J Michael; Brown, David L; Mack, Michael J; Holper, Elizabeth M

    2018-03-15

    The Centers for Medicare and Medicaid Services National Coverage Determination requires centers performing transcatheter aortic valve implantation (TAVI) to report clinical outcomes up to 1 year. Many sites encounter challenges in obtaining complete 1-year follow-up. We report our process to address this challenge. A multidisciplinary process involving clinical personnel, data and quality managers, and research coordinators was initiated to collect TAVI data at baseline, 30 days, and 1 year. This process included (1) planned clinical follow-up of all patients at 30 days and 1 year; (2) query of health-care system-wide integrated data warehouse (IDW) to ascertain last date of clinical contact within the system for all patients; (3) online obituary search, cross-referencing for unique patient identifiers to determine if mortality occurred in remaining unknown patients; and (4) phone calls to remaining unknown patients or patients' families. Between January 2012 and December 2016, 744 patients underwent TAVI. All 744 patients were eligible for 30-day follow-up and 546 were eligible for 1-year follow-up. At routine clinical follow-up of 22 of 744 (3%) patients at 30 days and 180 of 546 (33%) patients at 1 year had unknown survival status. The integrated data warehouse query confirmed status-alive for an additional 1 of 22 patients at 30 days (55%) and 91 of 180 patients at 1 year (51%). Obituaries were identified for 23 of 180 additional patients at 1 year (13%). Phone contact identified the remaining unknown patients at 30 days and 1 year, resulting in 100% known survival status for patients at 30 days (744 of 744) and at 1 year (546 of 546). In conclusion, using a comprehensive approach, we were able to determine survival status in 100% of patients who underwent TAVI. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. CHOmine: an integrated data warehouse for CHO systems biology and modeling.

    PubMed

    Gerstl, Matthias P; Hanscho, Michael; Ruckerbauer, David E; Zanghellini, Jürgen; Borth, Nicole

    2017-01-01

    The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. http://www.chogenome.org. © The Author(s) 2017. Published by Oxford University Press.

  16. A Simulation Modeling Approach Method Focused on the Refrigerated Warehouses Using Design of Experiment

    NASA Astrophysics Data System (ADS)

    Cho, G. S.

    2017-09-01

    For performance optimization of Refrigerated Warehouses, design parameters are selected based on the physical parameters such as number of equipment and aisles, speeds of forklift for ease of modification. This paper provides a comprehensive framework approach for the system design of Refrigerated Warehouses. We propose a modeling approach which aims at the simulation optimization so as to meet required design specifications using the Design of Experiment (DOE) and analyze a simulation model using integrated aspect-oriented modeling approach (i-AOMA). As a result, this suggested method can evaluate the performance of a variety of Refrigerated Warehouses operations.

  17. Ontology based heterogeneous materials database integration and semantic query

    NASA Astrophysics Data System (ADS)

    Zhao, Shuai; Qian, Quan

    2017-10-01

    Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.

  18. INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

    PubMed Central

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID:24324765

  19. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    PubMed

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  20. Moby and Moby 2: creatures of the deep (web).

    PubMed

    Vandervalk, Ben P; McCarthy, E Luke; Wilkinson, Mark D

    2009-03-01

    Facile and meaningful integration of data from disparate resources is the 'holy grail' of bioinformatics. Some resources have begun to address this problem by providing their data using Semantic Web standards, specifically the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Unfortunately, adoption of Semantic Web standards has been slow overall, and even in cases where the standards are being utilized, interconnectivity between resources is rare. In response, we have seen the emergence of centralized 'semantic warehouses' that collect public data from third parties, integrate it, translate it into OWL/RDF and provide it to the community as a unified and queryable resource. One limitation of the warehouse approach is that queries are confined to the resources that have been selected for inclusion. A related problem, perhaps of greater concern, is that the majority of bioinformatics data exists in the 'Deep Web'-that is, the data does not exist until an application or analytical tool is invoked, and therefore does not have a predictable Web address. The inability to utilize Uniform Resource Identifiers (URIs) to address this data is a barrier to its accessibility via URI-centric Semantic Web technologies. Here we examine 'The State of the Union' for the adoption of Semantic Web standards in the health care and life sciences domain by key bioinformatics resources, explore the nature and connectivity of several community-driven semantic warehousing projects, and report on our own progress with the CardioSHARE/Moby-2 project, which aims to make the resources of the Deep Web transparently accessible through SPARQL queries.

  1. XWeB: The XML Warehouse Benchmark

    NASA Astrophysics Data System (ADS)

    Mahboubi, Hadj; Darmont, Jérôme

    With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems.

  2. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data

    PubMed Central

    2014-01-01

    Background Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility. PMID:25089180

  3. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.

    PubMed

    Wu, Hongyan; Fujiwara, Toyofumi; Yamamoto, Yasunori; Bolleman, Jerven; Yamaguchi, Atsuko

    2014-01-01

    Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.

  4. Extract useful knowledge from agro-hydrological simulations data for decision making

    NASA Astrophysics Data System (ADS)

    Gascuel-odoux, C.; Bouadi, T.; Cordier, M.; Quiniou, R.

    2013-12-01

    In recent years, models have been developed and used to test the effect of scenarios and help stakeholders in decision making. Agro-hydrological models have guided agricultural water management by testing the effect of landscape structure and farming system changes on water quantity and quality. Such models generate a large amount of data but few are stored and are often not customized for stakeholders, so that a great amount of information is lost from the simulation process or not transformed in a usable format. A first approach, already published (Trepos et al., 2012), has been developed to identify object oriented tree patterns, that represent surface flow and pollutant pathways from plot to plot, involved in water pollution by herbicides. A simulation model (Gascuel-odoux et al., 2009) predicted herbicide transfer rate, defined as the proportion of applied herbicide that reaches water courses. The predictions were used as a set of learning examples for symbolic learning techniques to induce rules based on qualitative and quantitative attributes and explain two extreme classes in transfer rate. Two automatic symbolic learning techniques were used: the inductive logic programming approach to induce spatial tree patterns, and an attribute-value method to induce aggregated attributes of the trees. A visualization interface allows the users to identify rules explaining contamination and mitigation measures improving the current situation. A second approach has been recently developed to analyse directly the simulated data (Bouadi et al, submitted). A data warehouse called N-catch has been built to store and manage simulation data from the agro-hydrological model TNT2 (Beaujouan et al., 2002). 44 output key simulated variables are stored per plot and at a daily time step on a 50 squared km area, i.e, 8 GB of storage size. After identifying the set of multileveled dimensions integrating hierarchical structures and relationships among related dimension levels, N-Catch has been designed using the open source Business Intelligence Platform Pentaho. We show how to use online analytical processing (OLAP) to access and exploit, intuitively and quickly, the multidimensional and aggregated data from the N-Catch data warehouse. We illustrate how the data warehouse can be used to explore spatio-temporal dimensions efficiently and to discover new knowledge at multiple levels of simulation. OLAP tool can be used to synthesize environmental information and understand nitrogen emissions in water bodies by generating comparative and personalized views of historical data. This DWH is currently extended with data mining or information retrieval methods as Skyline queries to perform advanced analyses (Bouadi et al., 2012). Bouadi et al. N-Catch: A Data Warehouse for Multilevel Analysis of Simulated Nitrogen Data from an Agro-hydrological Model. Submitted. Bouadi et al., 2012) Bouadi, T., Cordier, M., and Quiniou, R. (2012). Incremental computation of skyline queries with dynamic preferences. In DEXA (1), pages 219-233. Trepos et al. 2012. Mining simulation data by rule induction to determine critical source areas of stream water pollution by herbicides. Computers and Electronics in Agriculture 86, 75-88.

  5. Automated Reporting of Trainee Metrics Using Electronic Clinical Systems.

    PubMed

    Levin, Jonathan C; Hron, Jonathan

    2017-06-01

    The Accreditation Council for Graduate Medical Education has called for increased emphasis on reporting objective performance measures to trainees and programs. However, reporting of objective measures, including clinical volume, is largely omitted from training programs. To use automated electronic medical systems at a tertiary pediatric care hospital to create a dashboard that reports objective trainee and program metrics, including clinical volume and diagnoses in a pediatrics residency. We queried an enterprise data warehouse that aggregates data daily from multiple hospital systems to identify patient encounters during which senior pediatrics residents at Boston Children's Hospital had entered documentation over a 9-month period. From this query, we created a filterable dashboard to display clinical volume and diagnosis data by individual resident and in aggregate. A total of 44 of 45 senior residents (98%) in the program were included in analysis. We identified 12 198 patient encounters during which a senior pediatrics resident had entered documentation; these included a median of 332 inpatient encounters per resident, 122 emergency department encounters, and 84 outpatient encounters. The most common diagnoses stratified by clinical site were: inpatient - dehydration (median = 61); emergency department - long-term/current drug therapy (median = 16); and outpatient - encounter for immunization (median = 48). We used electronic health record systems to generate performance dashboards for trainees in a pediatrics residency across different sites of care with reported volume by diagnosis. Our dashboards provide feedback to program leadership regarding individual and aggregate trainee experience and allow individual trainees to compare their clinical exposure to peers.

  6. A Data Warehouse Architecture for DoD Healthcare Performance Measurements.

    DTIC Science & Technology

    1999-09-01

    design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse of healthcare metrics. With the DoD healthcare...framework, this thesis defines a methodology to design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse...21 F. INABILITY TO CONDUCT HELATHCARE ANALYSIS

  7. Concept of operations for knowledge discovery from Big Data across enterprise data warehouses

    NASA Astrophysics Data System (ADS)

    Sukumar, Sreenivas R.; Olama, Mohammed M.; McNair, Allen W.; Nutaro, James J.

    2013-05-01

    The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Options that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.

  8. DICOM Data Warehouse: Part 2.

    PubMed

    Langer, Steve G

    2016-06-01

    In 2010, the DICOM Data Warehouse (DDW) was launched as a data warehouse for DICOM meta-data. Its chief design goals were to have a flexible database schema that enabled it to index standard patient and study information, modality specific tags (public and private), and create a framework to derive computable information (derived tags) from the former items. Furthermore, it was to map the above information to an internally standard lexicon that enables a non-DICOM savvy programmer to write standard SQL queries and retrieve the equivalent data from a cohort of scanners, regardless of what tag that data element was found in over the changing epochs of DICOM and ensuing migration of elements from private to public tags. After 5 years, the original design has scaled astonishingly well. Very little has changed in the database schema. The knowledge base is now fluent in over 90 device types. Also, additional stored procedures have been written to compute data that is derivable from standard or mapped tags. Finally, an early concern is that the system would not be able to address the variability DICOM-SR objects has been addressed. As of this writing the system is indexing 300 MR, 600 CT, and 2000 other (XA, DR, CR, MG) imaging studies per day. The only remaining issue to be solved is the case for tags that were not prospectively indexed-and indeed, this final challenge may lead to a noSQL, big data, approach in a subsequent version.

  9. Leveraging hospital big data to monitor flu epidemics.

    PubMed

    Bouzillé, Guillaume; Poirier, Canelle; Campillo-Gimenez, Boris; Aubert, Marie-Laure; Chabot, Mélanie; Chazard, Emmanuel; Lavenu, Audrey; Cuggia, Marc

    2018-02-01

    Influenza epidemics are a major public health concern and require a costly and time-consuming surveillance system at different geographical scales. The main challenge is being able to predict epidemics. Besides traditional surveillance systems, such as the French Sentinel network, several studies proposed prediction models based on internet-user activity. Here, we assessed the potential of hospital big data to monitor influenza epidemics. We used the clinical data warehouse of the Academic Hospital of Rennes (France) and then built different queries to retrieve relevant information from electronic health records to gather weekly influenza-like illness activity. We found that the query most highly correlated with Sentinel network estimates was based on emergency reports concerning discharged patients with a final diagnosis of influenza (Pearson's correlation coefficient (PCC) of 0.931). The other tested queries were based on structured data (ICD-10 codes of influenza in Diagnosis-related Groups, and influenza PCR tests) and performed best (PCC of 0.981 and 0.953, respectively) during the flu season 2014-15. This suggests that both ICD-10 codes and PCR results are associated with severe epidemics. Finally, our approach allowed us to obtain additional patients' characteristics, such as the sex ratio or age groups, comparable with those from the Sentinel network. Conclusions: Hospital big data seem to have a great potential for monitoring influenza epidemics in near real-time. Such a method could constitute a complementary tool to standard surveillance systems by providing additional characteristics on the concerned population or by providing information earlier. This system could also be easily extended to other diseases with possible activity changes. Additional work is needed to assess the real efficacy of predictive models based on hospital big data to predict flu epidemics. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Underestimated prevalence of heart failure in hospital inpatients: a comparison of ICD codes and discharge letter information.

    PubMed

    Kaspar, Mathias; Fette, Georg; Güder, Gülmisal; Seidlmayer, Lea; Ertl, Maximilian; Dietrich, Georg; Greger, Helmut; Puppe, Frank; Störk, Stefan

    2018-04-17

    Heart failure is the predominant cause of hospitalization and amongst the leading causes of death in Germany. However, accurate estimates of prevalence and incidence are lacking. Reported figures originating from different information sources are compromised by factors like economic reasons or documentation quality. We implemented a clinical data warehouse that integrates various information sources (structured parameters, plain text, data extracted by natural language processing) and enables reliable approximations to the real number of heart failure patients. Performance of ICD-based diagnosis in detecting heart failure was compared across the years 2000-2015 with (a) advanced definitions based on algorithms that integrate various sources of the hospital information system, and (b) a physician-based reference standard. Applying these methods for detecting heart failure in inpatients revealed that relying on ICD codes resulted in a marked underestimation of the true prevalence of heart failure, ranging from 44% in the validation dataset to 55% (single year) and 31% (all years) in the overall analysis. Percentages changed over the years, indicating secular changes in coding practice and efficiency. Performance was markedly improved using search and permutation algorithms from the initial expert-specified query (F1 score of 81%) to the computer-optimized query (F1 score of 86%) or, alternatively, optimizing precision or sensitivity depending on the search objective. Estimating prevalence of heart failure using ICD codes as the sole data source yielded unreliable results. Diagnostic accuracy was markedly improved using dedicated search algorithms. Our approach may be transferred to other hospital information systems.

  11. Scale out databases for CERN use cases

    NASA Astrophysics Data System (ADS)

    Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper

    2015-12-01

    Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.

  12. A journey to Semantic Web query federation in the life sciences.

    PubMed

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.

  13. A journey to Semantic Web query federation in the life sciences

    PubMed Central

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-01-01

    Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394

  14. Concept of Operations for Collaboration and Discovery from Big Data Across Enterprise Data Warehouses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Olama, Mohammed M; Nutaro, James J; Sukumar, Sreenivas R

    2013-01-01

    The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Optionsmore » that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.« less

  15. Promotion bureau warehouse system design. Case study in University of AA

    NASA Astrophysics Data System (ADS)

    Parwati, N.; Qibtiyah, M.

    2017-12-01

    The warehouse becomes one of the important parts in an industry. By having a good warehousing system, an industry can improve the effectiveness of its performance, so that profits for the company can continue to increase. Meanwhile, if it has a poorly organized warehouse system, it is feared there will be a decrease in the level of effectiveness of the industry itself. In this research, the object was warehousing system in promotion bureau of University AA. To improve the effectiveness of warehousing system, warehouse layout design is done by specifying categories of goods based on the flow of goods in and out of warehouse with ABC analysis method. In addition, the design of information systems to assist in controlling the system to support all the demand for every burreau and department in the university.

  16. Truck Terminal and Warehouse Survey Results

    DOT National Transportation Integrated Search

    1996-03-01

    The survey of truck terminals and warehouses resulted in locating the highway bottlenecks for truck movements which are more localized in nature than the previous air, marine, and rail surveys performed by the NYMTC Central Staff. However, all of the...

  17. Can Leader–Member Exchange Contribute to Safety Performance in An Italian Warehouse?

    PubMed Central

    Mariani, Marco G.; Curcuruto, Matteo; Matic, Mirna; Sciacovelli, Paolo; Toderi, Stefano

    2017-01-01

    Introduction: The research considers safety climate in a warehouse and wants to analyze the Leader–Member Exchange (LMX) role in respect to safety performance. Griffin and Neal’s safety model was adopted and Leader-Member Exchange was inserted as moderator in the relationships between safety climate and proximal antecedents (motivation and knowledge) of safety performance constructs (compliance and participation). Materials and Methods: Survey data were collected from a sample of 133 full-time employees in an Italian warehouse. The statistical framework of Hayes (2013) was adopted for moderated mediation analysis. Results: Proximal antecedents partially mediated the relationship between Safety climate and safety participation, but not safety compliance. Moreover, the results from the moderation analysis showed that the Leader–Member Exchange moderated the influence of safety climate on proximal antecedents and the mediation exist only at the higher level of LMX. Conclusion: The study shows that the different aspects of leadership processes interact in explaining individual proficiency in safety practices. Practical Implications: Organizations as warehouses should improve the quality of the relationship between a leader and a subordinate based upon the dimensions of respect, trust, and obligation for high level of safety performance. PMID:28553244

  18. Can Leader-Member Exchange Contribute to Safety Performance in An Italian Warehouse?

    PubMed

    Mariani, Marco G; Curcuruto, Matteo; Matic, Mirna; Sciacovelli, Paolo; Toderi, Stefano

    2017-01-01

    Introduction: The research considers safety climate in a warehouse and wants to analyze the Leader-Member Exchange (LMX) role in respect to safety performance. Griffin and Neal's safety model was adopted and Leader-Member Exchange was inserted as moderator in the relationships between safety climate and proximal antecedents (motivation and knowledge) of safety performance constructs (compliance and participation). Materials and Methods: Survey data were collected from a sample of 133 full-time employees in an Italian warehouse. The statistical framework of Hayes (2013) was adopted for moderated mediation analysis. Results: Proximal antecedents partially mediated the relationship between Safety climate and safety participation, but not safety compliance. Moreover, the results from the moderation analysis showed that the Leader-Member Exchange moderated the influence of safety climate on proximal antecedents and the mediation exist only at the higher level of LMX. Conclusion: The study shows that the different aspects of leadership processes interact in explaining individual proficiency in safety practices. Practical Implications: Organizations as warehouses should improve the quality of the relationship between a leader and a subordinate based upon the dimensions of respect, trust, and obligation for high level of safety performance.

  19. The DEDUCE Guided Query Tool: Providing Simplified Access to Clinical Data for Research and Quality Improvement

    PubMed Central

    Horvath, Monica M.; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-01-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction—the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a guided query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. PMID:21130181

  20. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement.

    PubMed

    Horvath, Monica M; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-04-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction--the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a Guided Query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. Copyright © 2010 Elsevier Inc. All rights reserved.

  1. Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python

    NASA Astrophysics Data System (ADS)

    Roy, Avipsa; Fouché, Edouard; Rodriguez Morales, Rafael; Moehler, Gregor

    2017-04-01

    As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like geopandas and shapely available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. Ibmdbpy-spatial provides a geospatial extension to the ibmdbpy package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in IBM dashDB, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called Bluemix. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. Ibmdbpy-spatial accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the dashDB spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance IdaDataBase, which uses a middleware API namely - pypyodbc or jaydebeapi to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the IdaGeoDataFrame, with a specific geometry attribute which recognises a planar geometry column in dashDB and 3) Python wrappers for spatial functions like within, distance, area, buffer} and more which dashDB currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known geopandas-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as matplotlib and folium in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough.

  2. Get Set! E-Ready, ... E-Learn! The E-Readiness of Warehouse Workers

    ERIC Educational Resources Information Center

    Moolman, Hermanus B.; Blignaut, Seugnet

    2008-01-01

    Modern organizations use technology to expand across traditional business zones and boundaries to survive the global commercial village. While IT systems allow organizations to maintain a competitive edge, South African unskilled labour performing warehouse operations are frequently retrained to keep abreast with Information Technology.…

  3. Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution.

    PubMed

    Sebaa, Abderrazak; Chikh, Fatima; Nouicer, Amina; Tari, AbdelKamel

    2018-02-19

    The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data. However, without an efficient system design and architecture, these performances will not be significant and valuable for medical managers. In this paper, we provide a short review of the literature about research issues of traditional data warehouses and we present some important Hadoop-based data warehouses. In addition, a Hadoop-based architecture and a conceptual data model for designing medical Big Data warehouse are given. In our case study, we provide implementation detail of big data warehouse based on the proposed architecture and data model in the Apache Hadoop platform to ensure an optimal allocation of health resources.

  4. The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience.

    PubMed

    Jannot, Anne-Sophie; Zapletal, Eric; Avillach, Paul; Mamzer, Marie-France; Burgun, Anita; Degoulet, Patrice

    2017-06-01

    When developed jointly with clinical information systems, clinical data warehouses (CDWs) facilitate the reuse of healthcare data and leverage clinical research. To describe both data access and use for clinical research, epidemiology and health service research of the "Hôpital Européen Georges Pompidou" (HEGP) CDW. The CDW has been developed since 2008 using an i2b2 platform. It was made available to health professionals and researchers in October 2010. Procedures to access data have been implemented and different access levels have been distinguished according to the nature of queries. As of July 2016, the CDW contained the consolidated data of over 860,000 patients followed since the opening of the HEGP hospital in July 2000. These data correspond to more than 122 million clinical item values, 124 million biological item values, and 3.7 million free text reports. The ethics committee of the hospital evaluates all CDW projects that generate secondary data marts. Characteristics of the 74 research projects validated between January 2011 and December 2015 are described. The use of HEGP CDWs is a key facilitator for clinical research studies. It required however important methodological and organizational support efforts from a biomedical informatics department. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Modular design, application architecture, and usage of a self-service model for enterprise data delivery: The Duke Enterprise Data Unified Content Explorer (DEDUCE)

    PubMed Central

    Horvath, Monica M.; Rusincovitch, Shelley A.; Brinson, Stephanie; Shang, Howard C.; Evans, Steve; Ferranti, Jeffrey M.

    2015-01-01

    Purpose Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Methods Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of “smart dimensions”, and dynamically-generated user interfaces. Results DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort “join” functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Conclusions Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. PMID:25051403

  6. Modular design, application architecture, and usage of a self-service model for enterprise data delivery: the Duke Enterprise Data Unified Content Explorer (DEDUCE).

    PubMed

    Horvath, Monica M; Rusincovitch, Shelley A; Brinson, Stephanie; Shang, Howard C; Evans, Steve; Ferranti, Jeffrey M

    2014-12-01

    Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of "smart dimensions", and dynamically-generated user interfaces. DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort "join" functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. Copyright © 2014 Elsevier Inc. All rights reserved.

  7. Medical data mining: knowledge discovery in a clinical data warehouse.

    PubMed Central

    Prather, J. C.; Lobach, D. F.; Goodwin, L. K.; Hales, J. W.; Hage, M. L.; Hammond, W. E.

    1997-01-01

    Clinical databases have accumulated large quantities of information about patients and their medical conditions. Relationships and patterns within this data could provide new medical knowledge. Unfortunately, few methodologies have been developed and applied to discover this hidden knowledge. In this study, the techniques of data mining (also known as Knowledge Discovery in Databases) were used to search for relationships in a large clinical database. Specifically, data accumulated on 3,902 obstetrical patients were evaluated for factors potentially contributing to preterm birth using exploratory factor analysis. Three factors were identified by the investigators for further exploration. This paper describes the processes involved in mining a clinical database including data warehousing, data query and cleaning, and data analysis. PMID:9357597

  8. biochem4j: Integrated and extensible biochemical knowledge through graph databases.

    PubMed

    Swainston, Neil; Batista-Navarro, Riza; Carbonell, Pablo; Dobson, Paul D; Dunstan, Mark; Jervis, Adrian J; Vinaixa, Maria; Williams, Alan R; Ananiadou, Sophia; Faulon, Jean-Loup; Mendes, Pedro; Kell, Douglas B; Scrutton, Nigel S; Breitling, Rainer

    2017-01-01

    Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and-crucially-the relationships between them. Such a resource should be extensible, such that newly discovered relationships-for example, those between novel, synthetic enzymes and non-natural products-can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.

  9. biochem4j: Integrated and extensible biochemical knowledge through graph databases

    PubMed Central

    Batista-Navarro, Riza; Dunstan, Mark; Jervis, Adrian J.; Vinaixa, Maria; Ananiadou, Sophia; Faulon, Jean-Loup; Kell, Douglas B.

    2017-01-01

    Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and–crucially–the relationships between them. Such a resource should be extensible, such that newly discovered relationships–for example, those between novel, synthetic enzymes and non-natural products–can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists. PMID:28708831

  10. Protecting count queries in study design

    PubMed Central

    Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Objective Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. Methods A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Results Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Conclusions Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems. PMID:22511018

  11. Protecting count queries in study design.

    PubMed

    Vinterbo, Staal A; Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.

  12. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  13. SuperTarget and Matador: resources for exploring drug-target relationships.

    PubMed

    Günther, Stefan; Kuhn, Michael; Dunkel, Mathias; Campillos, Monica; Senger, Christian; Petsalaki, Evangelia; Ahmed, Jessica; Urdiales, Eduardo Garcia; Gewiess, Andreas; Jensen, Lars Juhl; Schneider, Reinhard; Skoblo, Roman; Russell, Robert B; Bourne, Philip E; Bork, Peer; Preissner, Robert

    2008-01-01

    The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de.

  14. Comparison of Dijkstra's algorithm and dynamic programming method in finding shortest path for order picker in a warehouse

    NASA Astrophysics Data System (ADS)

    Nordin, Noraimi Azlin Mohd; Omar, Mohd; Sharif, S. Sarifah Radiah

    2017-04-01

    Companies are looking forward to improve their productivity within their warehouse operations and distribution centres. In a typical warehouse operation, order picking contributes more than half percentage of the operating costs. Order picking is a benchmark in measuring the performance and productivity improvement of any warehouse management. Solving order picking problem is crucial in reducing response time and waiting time of a customer in receiving his demands. To reduce the response time, proper routing for picking orders is vital. Moreover, in production line, it is vital to always make sure the supplies arrive on time. Hence, a sample routing network will be applied on EP Manufacturing Berhad (EPMB) as a case study. The Dijkstra's algorithm and Dynamic Programming method are applied to find the shortest distance for an order picker in order picking. The results show that the Dynamic programming method is a simple yet competent approach in finding the shortest distance to pick an order that is applicable in a warehouse within a short time period.

  15. Energy Finance Data Warehouse Manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Sangkeun; Chinthavali, Supriya; Shankar, Mallikarjun

    The Office of Energy Policy and Systems Analysis s finance team (EPSA-50) requires a suite of automated applications that can extract specific data from a flexible data warehouse (where datasets characterizing energy-related finance, economics and markets are maintained and integrated), perform relevant operations and creatively visualize them to provide a better understanding of what policy options affect various operators/sectors of the electricity system. In addition, the underlying data warehouse should be structured in the most effective and efficient way so that it can become increasingly valuable over time. This report describes the Energy Finance Data Warehouse (EFDW) framework that hasmore » been developed to accomplish the defined requirement above. We also specifically dive into the Sankey generator use-case scenario to explain the components of the EFDW framework and their roles. An excel-based data warehouse was used in the creation of the energy finance Sankey diagram and other detailed data finance visualizations to support energy policy analysis. The framework also captures the methodology, calculations and estimations analysts used for the calculation as well as relevant sources so newer analysts can build on work done previously.« less

  16. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Electronic warehouse receipts. 735.303 Section 735.303... AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT Warehouse Receipts § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  17. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Electronic warehouse receipts. 735.303 Section 735.303... AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT Warehouse Receipts § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  18. Data Model Performance in Data Warehousing

    NASA Astrophysics Data System (ADS)

    Rorimpandey, G. C.; Sangkop, F. I.; Rantung, V. P.; Zwart, J. P.; Liando, O. E. S.; Mewengkang, A.

    2018-02-01

    Data Warehouses have increasingly become important in organizations that have large amount of data. It is not a product but a part of a solution for the decision support system in those organizations. Data model is the starting point for designing and developing of data warehouses architectures. Thus, the data model needs stable interfaces and consistent for a longer period of time. The aim of this research is to know which data model in data warehousing has the best performance. The research method is descriptive analysis, which has 3 main tasks, such as data collection and organization, analysis of data and interpretation of data. The result of this research is discussed in a statistic analysis method, represents that there is no statistical difference among data models used in data warehousing. The organization can utilize four data model proposed when designing and developing data warehouse.

  19. Database Are Not Toasters: A Framework for Comparing Data Warehouse Appliances

    NASA Astrophysics Data System (ADS)

    Trajman, Omer; Crolotte, Alain; Steinhoff, David; Nambiar, Raghunath Othayoth; Poess, Meikel

    The success of Business Intelligence (BI) applications depends on two factors, the ability to analyze data ever more quickly and the ability to handle ever increasing volumes of data. Data Warehouse (DW) and Data Mart (DM) installations that support BI applications have historically been built using traditional architectures either designed from the ground up or based on customized reference system designs. The advent of Data Warehouse Appliances (DA) brings packaged software and hardware solutions that address performance and scalability requirements for certain market segments. The differences between DAs and custom installations make direct comparisons between them impractical and suggest the need for a targeted DA benchmark. In this paper we review data warehouse appliances by surveying thirteen products offered today. We assess the common characteristics among them and propose a classification for DA offerings. We hope our results will help define a useful benchmark for DAs.

  20. Development of a Clinical Data Warehouse for Hospital Infection Control

    PubMed Central

    Wisniewski, Mary F.; Kieszkowski, Piotr; Zagorski, Brandon M.; Trick, William E.; Sommers, Michael; Weinstein, Robert A.

    2003-01-01

    Existing data stored in a hospital's transactional servers have enormous potential to improve performance measurement and health care quality. Accessing, organizing, and using these data to support research and quality improvement projects are evolving challenges for hospital systems. The authors report development of a clinical data warehouse that they created by importing data from the information systems of three affiliated public hospitals. They describe their methodology; difficulties encountered; responses from administrators, computer specialists, and clinicians; and the steps taken to capture and store patient-level data. The authors provide examples of their use of the clinical data warehouse to monitor antimicrobial resistance, to measure antimicrobial use, to detect hospital-acquired bloodstream infections, to measure the cost of infections, and to detect antimicrobial prescribing errors. In addition, they estimate the amount of time and money saved and the increased precision achieved through the practical application of the data warehouse. PMID:12807807

  1. [Technical improvement of cohort constitution in administrative health databases: Providing a tool for integration and standardization of data applicable in the French National Health Insurance Database (SNIIRAM)].

    PubMed

    Ferdynus, C; Huiart, L

    2016-09-01

    Administrative health databases such as the French National Heath Insurance Database - SNIIRAM - are a major tool to answer numerous public health research questions. However the use of such data requires complex and time-consuming data management. Our objective was to develop and make available a tool to optimize cohort constitution within administrative health databases. We developed a process to extract, transform and load (ETL) data from various heterogeneous sources in a standardized data warehouse. This data warehouse is architected as a star schema corresponding to an i2b2 star schema model. We then evaluated the performance of this ETL using data from a pharmacoepidemiology research project conducted in the SNIIRAM database. The ETL we developed comprises a set of functionalities for creating SAS scripts. Data can be integrated into a standardized data warehouse. As part of the performance assessment of this ETL, we achieved integration of a dataset from the SNIIRAM comprising more than 900 million lines in less than three hours using a desktop computer. This enables patient selection from the standardized data warehouse within seconds of the request. The ETL described in this paper provides a tool which is effective and compatible with all administrative health databases, without requiring complex database servers. This tool should simplify cohort constitution in health databases; the standardization of warehouse data facilitates collaborative work between research teams. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  2. Data warehouse model design technology analysis and research

    NASA Astrophysics Data System (ADS)

    Jiang, Wenhua; Li, Qingshui

    2012-01-01

    Existing data storage format can not meet the needs of information analysis, data warehouse onto the historical stage, the data warehouse is to support business decision making and the creation of specially designed data collection. With the data warehouse, the companies will all collected information is stored in the data warehouse. The data warehouse is organized according to some, making information easy to access and has value. This paper focuses on the establishment of data warehouse and analysis, design, data warehouse, two barrier models, and compares them.

  3. Data warehouse model for monitoring key performance indicators (KPIs) using goal oriented approach

    NASA Astrophysics Data System (ADS)

    Abdullah, Mohammed Thajeel; Ta'a, Azman; Bakar, Muhamad Shahbani Abu

    2016-08-01

    The growth and development of universities, just as other organizations, depend on their abilities to strategically plan and implement development blueprints which are in line with their vision and mission statements. The actualizations of these statements, which are often designed into goals and sub-goals and linked to their respective actors are better measured by defining key performance indicators (KPIs) of the university. The proposes ReGADaK, which is an extended the GRAnD approach highlights the facts, dimensions, attributes, measures and KPIs of the organization. The measures from the goal analysis of this unit serve as the basis of developing the related university's KPIs. The proposed data warehouse schema is evaluated through expert review, prototyping and usability evaluation. The findings from the evaluation processes suggest that the proposed data warehouse schema is suitable for monitoring the University's KPIs.

  4. Evaluation of Sub Query Performance in SQL Server

    NASA Astrophysics Data System (ADS)

    Oktavia, Tanty; Sujarwo, Surya

    2014-03-01

    The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.

  5. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

  6. Atlas – a data warehouse for integrative bioinformatics

    PubMed Central

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

    2005-01-01

    Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693

  7. GeneLab Analysis Working Group Kick-Off Meeting

    NASA Technical Reports Server (NTRS)

    Costes, Sylvain V.

    2018-01-01

    Goals to achieve for GeneLab AWG - GL vision - Review of GeneLab AWG charter Timeline and milestones for 2018 Logistics - Monthly Meeting - Workshop - Internship - ASGSR Introduction of team leads and goals of each group Introduction of all members Q/A Three-tier Client Strategy to Democratize Data Physiological changes, pathway enrichment, differential expression, normalization, processing metadata, reproducibility, Data federation/integration with heterogeneous bioinformatics external databases The GLDS currently serves over 100 omics investigations to the biomedical community via open access. In order to expand the scope of metadata record searches via the GLDS, we designed a metadata warehouse that collects and updates metadata records from external systems housing similar data. To demonstrate the capabilities of federated search and retrieval of these data, we imported metadata records from three open-access data systems into the GLDS metadata warehouse: NCBI's Gene Expression Omnibus (GEO), EBI's PRoteomics IDEntifications (PRIDE) repository, and the Metagenomics Analysis server (MG-RAST). Each of these systems defines metadata for omics data sets differently. One solution to bridge such differences is to employ a common object model (COM) to which each systems' representation of metadata can be mapped. Warehoused metadata records are then transformed at ETL to this single, common representation. Queries generated via the GLDS are then executed against the warehouse, and matching records are shown in the COM representation (Fig. 1). While this approach is relatively straightforward to implement, the volume of the data in the omics domain presents challenges in dealing with latency and currency of records. Furthermore, the lack of a coordinated has been federated data search for and retrieval of these kinds of data across other open-access systems, so that users are able to conduct biological meta-investigations using data from a variety of sources. Such meta-investigations are key to corroborating findings from many kinds of assays and translating them into systems biology knowledge and, eventually, therapeutics.

  8. Providing Web Interfaces to the NSF EarthScope USArray Transportable Array

    NASA Astrophysics Data System (ADS)

    Vernon, Frank; Newman, Robert; Lindquist, Kent

    2010-05-01

    Since April 2004 the EarthScope USArray seismic network has grown to over 850 broadband stations that stream multi-channel data in near real-time to the Array Network Facility in San Diego. Providing secure, yet open, access to real-time and archived data for a broad range of audiences is best served by a series of platform agnostic low-latency web-based applications. We present a framework of tools that mediate between the world wide web and Boulder Real Time Technologies Antelope Environmental Monitoring System data acquisition and archival software. These tools provide comprehensive information to audiences ranging from network operators and geoscience researchers, to funding agencies and the general public. This ranges from network-wide to station-specific metadata, state-of-health metrics, event detection rates, archival data and dynamic report generation over a station's two year life span. Leveraging open source web-site development frameworks for both the server side (Perl, Python and PHP) and client-side (Flickr, Google Maps/Earth and jQuery) facilitates the development of a robust extensible architecture that can be tailored on a per-user basis, with rapid prototyping and development that adheres to web-standards. Typical seismic data warehouses allow online users to query and download data collected from regional networks, without the scientist directly visually assessing data coverage and/or quality. Using a suite of web-based protocols, we have recently developed an online seismic waveform interface that directly queries and displays data from a relational database through a web-browser. Using the Python interface to Datascope and the Python-based Twisted network package on the server side, and the jQuery Javascript framework on the client side to send and receive asynchronous waveform queries, we display broadband seismic data using the HTML Canvas element that is globally accessible by anyone using a modern web-browser. We are currently creating additional interface tools to create a rich-client interface for accessing and displaying seismic data that can be deployed to any system running the Antelope Real Time System. The software is freely available from the Antelope contributed code Git repository (http://www.antelopeusersgroup.org).

  9. Next generation phenotyping using narrative reports in a rare disease clinical data warehouse.

    PubMed

    Garcelon, Nicolas; Neuraz, Antoine; Salomon, Rémi; Bahi-Buisson, Nadia; Amiel, Jeanne; Picard, Capucine; Mahlaoui, Nizar; Benoit, Vincent; Burgun, Anita; Rance, Bastien

    2018-05-31

    Secondary use of data collected in Electronic Health Records opens perspectives for increasing our knowledge of rare diseases. The clinical data warehouse (named Dr. Warehouse) at the Necker-Enfants Malades Children's Hospital contains data collected during normal care for thousands of patients. Dr. Warehouse is oriented toward the exploration of clinical narratives. In this study, we present our method to find phenotypes associated with diseases of interest. We leveraged the frequency and TF-IDF to explore the association between clinical phenotypes and rare diseases. We applied our method in six use cases: phenotypes associated with the Rett, Lowe, Silver Russell, Bardet-Biedl syndromes, DOCK8 deficiency and Activated PI3-kinase Delta Syndrome (APDS). We asked domain experts to evaluate the relevance of the top-50 (for frequency and TF-IDF) phenotypes identified by Dr. Warehouse and computed the average precision and mean average precision. Experts concluded that between 16 and 39 phenotypes could be considered as relevant in the top-50 phenotypes ranked by descending frequency discovered by Dr. Warehouse (resp. between 11 and 41 for TF-IDF). Average precision ranges from 0.55 to 0.91 for frequency and 0.52 to 0.95 for TF-IDF. Mean average precision was 0.79. Our study suggests that phenotypes identified in clinical narratives stored in Electronic Health Record can provide rare disease specialists with candidate phenotypes that can be used in addition to the literature. Clinical Data Warehouses can be used to perform Next Generation Phenotyping, especially in the context of rare diseases. We have developed a method to detect phenotypes associated with a group of patients using medical concepts extracted from free-text clinical narratives.

  10. Summary of Research 2000, Department of Systems Management

    DTIC Science & Technology

    2001-12-01

    Postgraduate School, June 2000. Fryzlewicz, J., "Analysis of Measures of Performance and Continuous Improvement at the Naval Dental Center Pearl Harbor," Masters...mart driven relational system. Fourth, using the prototype relational data mart as a source system, a contemporary OLAP application is used to prove the...warehouse solution to integrating legacy systems are discussed. DoD KEY TECHNOLOGY AREA: Computing and Software KEYWORDS: OLAP , Data Warehouse

  11. Army Stock Positioning: How Can Distribution Performance Be Improved

    DTIC Science & Technology

    2017-01-01

    Support RAND Make a tax -deductible charitable contribution at www.rand.org/giving/contribute www.rand.org Preface The...the source- preference logic that selects the warehouse for issuing an item to a cus- tomer. Below, we describe the LMP source-preference logic and...the AMC pilot to use it to tailor how warehouses are selected to issue items to customers. AMC Pilot Implementation of LMP Source-Preference

  12. Effectiveness of the United States Marine Corps Tiered Evaluation System

    DTIC Science & Technology

    2015-03-01

    RA Manpower and Reserve Affairs MCMAP Marine Corps Martial Arts Program MMPR Marine Corps Promotions Branch MOS military occupational specialty...STAP subsequent term alignment program TFDW Total Force Data Warehouse USMC United States Marine Corps xv THIS PAGE INTENTIONALLY LEFT BLANK...Furthermore, I would like to thank Mr. Tim Johnson at the Total Force Data Warehouse and Ms. Doreen Marucci at the Performance Evaluation Section for

  13. SuperTarget goes quantitative: update on drug–target interactions

    PubMed Central

    Hecker, Nikolai; Ahmed, Jessica; von Eichborn, Joachim; Dunkel, Mathias; Macha, Karel; Eckert, Andreas; Gilson, Michael K.; Bourne, Philip E.; Preissner, Robert

    2012-01-01

    There are at least two good reasons for the on-going interest in drug–target interactions: first, drug-effects can only be fully understood by considering a complex network of interactions to multiple targets (so-called off-target effects) including metabolic and signaling pathways; second, it is crucial to consider drug-target-pathway relations for the identification of novel targets for drug development. To address this on-going need, we have developed a web-based data warehouse named SuperTarget, which integrates drug-related information associated with medical indications, adverse drug effects, drug metabolism, pathways and Gene Ontology (GO) terms for target proteins. At present, the updated database contains >6000 target proteins, which are annotated with >330 000 relations to 196 000 compounds (including approved drugs); the vast majority of interactions include binding affinities and pointers to the respective literature sources. The user interface provides tools for drug screening and target similarity inclusion. A query interface enables the user to pose complex queries, for example, to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target proteins within a certain affinity range. SuperTarget is available at http://bioinformatics.charite.de/supertarget. PMID:22067455

  14. CFGP: a web-based, comparative fungal genomics platform

    PubMed Central

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F.; Blair, Jaime E.; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the ‘fill-in-the-form-and-press-SUBMIT’ user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI. PMID:17947331

  15. Decision method for optimal selection of warehouse material handling strategies by production companies

    NASA Astrophysics Data System (ADS)

    Dobos, P.; Tamás, P.; Illés, B.

    2016-11-01

    Adequate establishment and operation of warehouse logistics determines the companies’ competitiveness significantly because it effects greatly the quality and the selling price of the goods that the production companies produce. In order to implement and manage an adequate warehouse system, adequate warehouse position, stock management model, warehouse technology, motivated work force committed to process improvement and material handling strategy are necessary. In practical life, companies have paid small attantion to select the warehouse strategy properly. Although it has a major influence on the production in the case of material warehouse and on smooth costumer service in the case of finished goods warehouse because this can happen with a huge loss in material handling. Due to the dynamically changing production structure, frequent reorganization of warehouse activities is needed, on what the majority of the companies react basically with no reactions. This work presents a simulation test system frames for eligible warehouse material handling strategy selection and also the decision method for selection.

  16. Warehouse Sanitation Workshop Handbook.

    ERIC Educational Resources Information Center

    Food and Drug Administration (DHHS/PHS), Washington, DC.

    This workshop handbook contains information and reference materials on proper food warehouse sanitation. The materials have been used at Food and Drug Administration (FDA) food warehouse sanitation workshops, and are selected by the FDA for use by food warehouse operators and for training warehouse sanitation employees. The handbook is divided…

  17. Solar system installation at Louisville, Kentucky

    NASA Technical Reports Server (NTRS)

    1978-01-01

    The installation of a solar space heating and domestic hot water system is described. The overall philosophy used was to install both a liquid and a hot air system retrofitted to existing office and combined warehouse building. The 1080 sq. ft. office space is heated first and excess heat is dumped into the warehouse. The two systems offer a unique opportunity to measure the performance and compare results of both air and liquid at one site.

  18. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  19. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  20. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 10 2011-01-01 2011-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  1. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  2. 7 CFR 1423.11 - Delivery and shipping standards for cotton warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Delivery and shipping standards for cotton warehouses... CORPORATION APPROVED WAREHOUSES § 1423.11 Delivery and shipping standards for cotton warehouses. (a) Unless... warehouse operator will: (1) Deliver stored cotton without unnecessary delay. (2) Be considered to have...

  3. 19 CFR 19.1 - Classes of customs warehouses.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... customs warehouses. (a) Classifications. Customs warehouses shall be designated according to the following classifications: (1) Class 1. Premises that may be owned or leased by the Government, when the exigencies of the... class 11 warehouse, following an application under § 19.2. So far as such warehouses are used for the...

  4. 19 CFR 19.1 - Classes of customs warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... customs warehouses. (a) Classifications. Customs warehouses shall be designated according to the following classifications: (1) Class 1. Premises that may be owned or leased by the Government, when the exigencies of the... class 11 warehouse, following an application under § 19.2. So far as such warehouses are used for the...

  5. Metadata to Support Data Warehouse Evolution

    NASA Astrophysics Data System (ADS)

    Solodovnikova, Darja

    The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.

  6. Design of a Multi Dimensional Database for the Archimed DataWarehouse.

    PubMed

    Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine

    2005-01-01

    The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.

  7. Complete Decoding and Reporting of Aviation Routine Weather Reports (METARs)

    NASA Technical Reports Server (NTRS)

    Lui, Man-Cheung Max

    2014-01-01

    Aviation Routine Weather Report (METAR) provides surface weather information at and around observation stations, including airport terminals. These weather observations are used by pilots for flight planning and by air traffic service providers for managing departure and arrival flights. The METARs are also an important source of weather data for Air Traffic Management (ATM) analysts and researchers at NASA and elsewhere. These researchers use METAR to correlate severe weather events with local or national air traffic actions that restrict air traffic, as one example. A METAR is made up of multiple groups of coded text, each with a specific standard coding format. These groups of coded text are located in two sections of a report: Body and Remarks. The coded text groups in a U.S. METAR are intended to follow the coding standards set by National Oceanic and Atmospheric Administration (NOAA). However, manual data entry and edits made by a human report observer may result in coded text elements that do not follow the standards, especially in the Remarks section. And contrary to the standards, some significant weather observations are noted only in the Remarks section and not in the Body section of the reports. While human readers can infer the intended meaning of non-standard coding of weather conditions, doing so with a computer program is far more challenging. However such programmatic pre-processing is necessary to enable efficient and faster database query when researchers need to perform any significant historical weather analysis. Therefore, to support such analysis, a computer algorithm was developed to identify groups of coded text anywhere in a report and to perform subsequent decoding in software. The algorithm considers common deviations from the standards and data entry mistakes made by observers. The implemented software code was tested to decode 12 million reports and the decoding process was able to completely interpret 99.93 of the reports. This document presents the deviations from the standards and the decoding algorithm. Storing all decoded data in a database allows users to quickly query a large amount of data and to perform data mining on the data. Users can specify complex query criteria not only on date or airport but also on weather condition. This document also describes the design of a database schema for storing the decoded data, and a Data Warehouse web application that allows users to perform reporting and analysis on the decoded data. Finally, this document presents a case study correlating dust storms reported in METARs from the Phoenix International airport with Ground Stops issued by Air Route Traffic Control Centers (ATCSCC). Blowing widespread dust is one of the weather conditions when dust storm occurs. By querying the database, 294 METARs were found to report blowing widespread dust at the Phoenix airport and 41 of them reported such condition only in the Remarks section of the reports. When METAR is a data source for an ATM research, it is important to include weather conditions not only from the Body section but also from the Remarks section of METARs.

  8. Clinical exposures during internal medicine acting internship: profiling student and team experiences.

    PubMed

    Smith, Todd I; LoPresti, Charles M

    2014-07-01

    The clinical learning model in medical education is driven by knowledge acquisition through direct patient-care experiences. Despite the emphasis on experiential learning, the ability of educators to quantify the clinical exposures of learners is limited. To utilize Veterans Affairs (VA) electronic medical record information through a data warehouse to quantify clinical exposures during an inpatient internal medicine rotation. We queried the VA clinical data warehouse for the patients encountered by each learner completing an acting internship rotation at the Cleveland VA Medical Center from July 2008 to November 2011. We then used discharge summary information to identify team exposures-patients seen by the learner's inpatient team who were not primarily assigned to the learner. Based on the learner and team exposures, we complied lists of past medical problems, medications prescribed, laboratory tests that resulted, radiology evaluated, and primary discharge diagnoses. Primary learner and team-based clinical exposures were evaluated for a total of 128 acting internship students. The percentage of learners who had a primary exposure to a medication/lab value/imaging result/diagnosis was calculated. The percentage of learners with at least 1 primary or team-based exposure to an item was also calculated. The most common exposures in each category are presented. Analysis of the clinical exposures during an inpatient rotation can augment the ability of educators to understand learners' experiences. These types of analyses could provide information to improve learner experience, implement novel curricula, and address educational gaps in clinical rotations. © 2014 Society of Hospital Medicine.

  9. The Structure of Reclaiming Warehouse of Minerals at Open-Cut Mines with the Use Combined Transport

    NASA Astrophysics Data System (ADS)

    Ikonnikov, D. A.; Kovshov, S. V.

    2017-07-01

    In the article performed an analysis of ore reclaiming and overloading point characteristics at modern opencast mines. Ore reclaiming represents the most effective way of stability support of power-intensive and expensive technological dressing process, and, consequently, of maintenance of the optimal production and set-up parameters of extraction and quality of finished product. The paper proposed the construction of the warehouse describing the technology of its creation. Equipment used for the warehouse described in detail. All stages of development and operation was shown. Advantages and disadvantages of using mechanical shovel excavator and hydraulic excavator “backdigger” as a reloading and reclaiming equipment was compared. Ore reclaiming and overloading point construction at cyclical and continuous method of mining using a hydraulic excavator “backdigger” was proposed.

  10. Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

    NASA Astrophysics Data System (ADS)

    Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

    2017-01-01

    Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.

  11. Anchor Modeling

    NASA Astrophysics Data System (ADS)

    Regardt, Olle; Rönnbäck, Lars; Bergholtz, Maria; Johannesson, Paul; Wohed, Petia

    Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The main reason for this state of affairs is that the environment of a data warehouse is in constant change, while the warehouse itself needs to provide a stable and consistent interface to information spanning extended periods of time. In this paper, we propose a modeling technique for data warehousing, called anchor modeling, that offers non-destructive extensibility mechanisms, thereby enabling robust and flexible management of changes in source systems. A key benefit of anchor modeling is that changes in a data warehouse environment only require extensions, not modifications, to the data warehouse. This ensures that existing data warehouse applications will remain unaffected by the evolution of the data warehouse, i.e. existing views and functions will not have to be modified as a result of changes in the warehouse model.

  12. Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

    PubMed

    Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L

    2000-01-01

    The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.

  13. The Challenges of Data Quality Evaluation in a Joint Data Warehouse

    PubMed Central

    Bae, Charles J.; Griffith, Sandra; Fan, Youran; Dunphy, Cheryl; Thompson, Nicolas; Urchek, John; Parchman, Alandra; Katzan, Irene L.

    2015-01-01

    Introduction: The use of clinically derived data from electronic health records (EHRs) and other electronic clinical systems can greatly facilitate clinical research as well as operational and quality initiatives. One approach for making these data available is to incorporate data from different sources into a joint data warehouse. When using such a data warehouse, it is important to understand the quality of the data. The primary objective of this study was to determine the completeness and concordance of common types of clinical data available in the Knowledge Program (KP) joint data warehouse, which contains feeds from several electronic systems including the EHR. Methods: A manual review was performed of specific data elements for 250 patients from an EHR, and these were compared with corresponding elements in the KP data warehouse. Completeness and concordance were calculated for five categories of data including demographics, vital signs, laboratory results, diagnoses, and medications. Results: In general, data elements for demographics, vital signs, diagnoses, and laboratory results were present in more cases in the source EHR compared to the KP. When data elements were available in both sources, there was a high concordance. In contrast, the KP data warehouse documented a higher prevalence of deaths and medications compared to the EHR. Discussion: Several factors contributed to the discrepancies between data in the KP and the EHR—including the start date and frequency of data feeds updates into the KP, inability to transfer data located in nonstructured formats (e.g., free text or scanned documents), as well as incomplete and missing data variables in the source EHR. Conclusion: When evaluating the quality of a data warehouse with multiple data sources, assessing completeness and concordance between data set and source data may be better than designating one to be a gold standard. This will allow the user to optimize the method and timing of data transfer in order to capture data with better accuracy. PMID:26290882

  14. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-08-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

  15. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-01-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650

  16. Managing dual warehouses with an incentive policy for deteriorating items

    NASA Astrophysics Data System (ADS)

    Yu, Jonas C. P.; Wang, Kung-Jeng; Lin, Yu-Siang

    2016-02-01

    Distributors in a supply chain usually limit their own warehouse in finite capacity for cost reduction and excess stock is held in a rent warehouse. In this study, we examine inventory control for deteriorating items in a two-warehouse setting. Assuming that there is an incentive offered by a rent warehouse that allows the rental fee to decrease over time, the objective of this study is to maximise the joint profit of the manufacturer and the distributor. An optimisation procedure is developed to derive the optimal joint economic lot size policy. Several criteria are identified to select the most appropriate warehouse configuration and inventory policy on the basis of storage duration of materials in a rent warehouse. Sensitivity analysis is done to examine the results of model robustness. The proposed model enables a manufacturer with a channel distributor to coordinate the use of alternative warehouses, and to maximise the joint profit of the manufacturer and the distributor.

  17. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  18. Design and implementation of fishery rescue data mart system

    NASA Astrophysics Data System (ADS)

    Pan, Jun; Huang, Haiguang; Liu, Yousong

    A novel data mart based system for fishery rescue field was designed and implemented. The system runs ETL process to deal with original data from various databases and data warehouses, and then reorganized the data into the fishery rescue data mart. Next, online analytical processing (OLAP) are carried out and statistical reports are generated automatically. Particularly, quick configuration schemes are designed to configure query dimensions and OLAP data sets. The configuration file will be transformed into statistic interfaces automatically through a wizard-style process. The system provides various forms of reporting files, including crystal reports, flash graphical reports, and two-dimensional data grids. In addition, a wizard style interface was designed to guide users customizing inquiry processes, making it possible for nontechnical staffs to access customized reports. Characterized by quick configuration, safeness and flexibility, the system has been successfully applied in city fishery rescue department.

  19. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    PubMed

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  20. Towards a Heterogeneous, Polystore-like Data Architecture for the US Department of Veteran Affairs (VA) Enterprise Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Begoli, Edmon; Bates, Jack; Kistler, Derek E

    The Polystore architecture revisits the federated approach to access and querying of the standalone, independent databases in the uniform and optimized fashion, but this time in the context of heterogeneous data and specialized analyses. In the light of this architectural philosophy, and in the light of the major data architecture development efforts at the US Department of Veterans Administration (VA), we discuss the need for the heterogeneous data store consisting of the large relational data warehouse, an image and text datastore, and a peta-scale genomic repository. The VA's heterogeneous datastore would, to a larger or smaller degree, follow the architecturalmore » blueprint proposed by the polystore architecture. To this end, we discuss the current state of the data architecture at VA, architectural alternatives for development of the heterogeneous datastore, the anticipated challenges, and the drawbacks and benefits of adopting the polystore architecture.« less

  1. The Architecture Design of Detection and Calibration System for High-voltage Electrical Equipment

    NASA Astrophysics Data System (ADS)

    Ma, Y.; Lin, Y.; Yang, Y.; Gu, Ch; Yang, F.; Zou, L. D.

    2018-01-01

    With the construction of Material Quality Inspection Center of Shandong electric power company, Electric Power Research Institute takes on more jobs on quality analysis and laboratory calibration for high-voltage electrical equipment, and informationization construction becomes urgent. In the paper we design a consolidated system, which implements the electronic management and online automation process for material sampling, test apparatus detection and field test. In the three jobs we use QR code scanning, online Word editing and electronic signature. These techniques simplify the complex process of warehouse management and testing report transferring, and largely reduce the manual procedure. The construction of the standardized detection information platform realizes the integrated management of high-voltage electrical equipment from their networking, running to periodic detection. According to system operation evaluation, the speed of transferring report is doubled, and querying data is also easier and faster.

  2. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  3. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    NASA Astrophysics Data System (ADS)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  4. 7 CFR 735.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General Provisions § 735.3..., change, and transfer warehouse receipts or other applicable document information retained in a central... provider, as a disinterested third party, authorized by DACO where information relating to warehouse...

  5. 7 CFR 735.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General Provisions § 735.3..., change, and transfer warehouse receipts or other applicable document information retained in a central... provider, as a disinterested third party, authorized by DACO where information relating to warehouse...

  6. Data warehouse governance programs in healthcare settings: a literature review and a call to action.

    PubMed

    Elliott, Thomas E; Holmes, John H; Davidson, Arthur J; La Chance, Pierre-Andre; Nelson, Andrew F; Steiner, John F

    2013-01-01

    Given the extensive data stored in healthcare data warehouses, data warehouse governance policies are needed to ensure data integrity and privacy. This review examines the current state of the data warehouse governance literature as it applies to healthcare data warehouses, identifies knowledge gaps, provides recommendations, and suggests approaches for further research. A comprehensive literature search using five data bases, journal article title-search, and citation searches was conducted between 1997 and 2012. Data warehouse governance documents from two healthcare systems in the USA were also reviewed. A modified version of nine components from the Data Governance Institute Framework for data warehouse governance guided the qualitative analysis. Fifteen articles were retrieved. Only three were related to healthcare settings, each of which addressed only one of the nine framework components. Of the remaining 12 articles, 10 addressed between one and seven framework components and the remainder addressed none. Each of the two data warehouse governance plans obtained from healthcare systems in the USA addressed a subset of the framework components, and between them they covered all nine. While published data warehouse governance policies are rare, the 15 articles and two healthcare organizational documents reviewed in this study may provide guidance to creating such policies. Additional research is needed in this area to ensure that data warehouse governance polices are feasible and effective. The gap between the development of data warehouses in healthcare settings and formal governance policies is substantial, as evidenced by the sparse literature in this domain.

  7. Efficient Execution Methods of Pivoting for Bulk Extraction of Entity-Attribute-Value-Modeled Data

    PubMed Central

    Luo, Gang; Frey, Lewis J.

    2017-01-01

    Entity-attribute-value (EAV) tables are widely used to store data in electronic medical records and clinical study data management systems. Before they can be used by various analytical (e.g., data mining and machine learning) programs, EAV-modeled data usually must be transformed into conventional relational table format through pivot operations. This time-consuming and resource-intensive process is often performed repeatedly on a regular basis, e.g., to provide a daily refresh of the content in a clinical data warehouse. Thus, it would be beneficial to make pivot operations as efficient as possible. In this paper, we present three techniques for improving the efficiency of pivot operations: 1) filtering out EAV tuples related to unneeded clinical parameters early on; 2) supporting pivoting across multiple EAV tables; and 3) conducting multi-query optimization. We demonstrate the effectiveness of our techniques through implementation. We show that our optimized execution method of pivoting using these techniques significantly outperforms the current basic execution method of pivoting. Our techniques can be used to build a data extraction tool to simplify the specification of and improve the efficiency of extracting data from the EAV tables in electronic medical records and clinical study data management systems. PMID:25608318

  8. 27 CFR 24.108 - Bonded wine warehouse application.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonded wine warehouse... BUREAU, DEPARTMENT OF THE TREASURY LIQUORS WINE Establishment and Operations Application § 24.108 Bonded wine warehouse application. A warehouse company or other person desiring to establish a bonded wine...

  9. 78 FR 77662 - Notice of Availability (NOA) for General Purpose Warehouse and Information Technology Center...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-24

    ... (NOA) for General Purpose Warehouse and Information Technology Center Construction (GPW/IT)--Tracy Site.... ACTION: Notice of Availability (NOA) for General Purpose Warehouse and Information Technology Center... FR 65300) announcing the publication of the General Purpose Warehouse and Information Technology...

  10. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 7 2013-01-01 2013-01-01 false Electronic warehouse receipts. 735.303 Section 735.303 Agriculture Regulations of the Department of Agriculture (Continued) FARM SERVICE AGENCY, DEPARTMENT OF... § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  11. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 7 2014-01-01 2014-01-01 false Electronic warehouse receipts. 735.303 Section 735.303 Agriculture Regulations of the Department of Agriculture (Continued) FARM SERVICE AGENCY, DEPARTMENT OF... § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  12. 7 CFR 735.303 - Electronic warehouse receipts.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 7 2012-01-01 2012-01-01 false Electronic warehouse receipts. 735.303 Section 735.303 Agriculture Regulations of the Department of Agriculture (Continued) FARM SERVICE AGENCY, DEPARTMENT OF... § 735.303 Electronic warehouse receipts. (a) Warehouse operators issuing EWR under the Act may issue EWR...

  13. 7 CFR 735.6 - Suspension, revocation and liquidation.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... control and begin an orderly liquidation of such warehouse inventory or provider system data as provided..., DEPARTMENT OF AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General... licensing or provider agreement. (4) Failure to maintain control of the warehouse or provider system. (5...

  14. 7 CFR 735.6 - Suspension, revocation and liquidation.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... control and begin an orderly liquidation of such warehouse inventory or provider system data as provided..., DEPARTMENT OF AGRICULTURE REGULATIONS FOR WAREHOUSES REGULATIONS FOR THE UNITED STATES WAREHOUSE ACT General... licensing or provider agreement. (4) Failure to maintain control of the warehouse or provider system. (5...

  15. 27 CFR 41.11 - Meaning of terms.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... Reserve Bank of New York. Export warehouse. A bonded internal revenue warehouse for the storage of tobacco... revenue laws of the United States. Export warehouse proprietor. Any person who operates an export warehouse. Factory. The premises of a manufacturer of tobacco products, cigarette papers or tubes, or...

  16. 7 CFR 1427.11 - Warehouse receipts.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... (Signature or initials), Date. (3) Alterations in other inserted data on a machine card type warehouse... 7 Agriculture 10 2010-01-01 2010-01-01 false Warehouse receipts. 1427.11 Section 1427.11... Deficiency Payments § 1427.11 Warehouse receipts. (a) Producers may obtain loans on eligible cotton...

  17. 7 CFR 1427.11 - Warehouse receipts.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... (Signature or initials), Date. (3) Alterations in other inserted data on a machine card type warehouse... 7 Agriculture 10 2011-01-01 2011-01-01 false Warehouse receipts. 1427.11 Section 1427.11... Deficiency Payments § 1427.11 Warehouse receipts. (a) Producers may obtain loans on eligible cotton...

  18. Warehouse multipoint temperature and humidity monitoring system design based on Kingview

    NASA Astrophysics Data System (ADS)

    Ou, Yanghui; Wang, Xifu; Liu, Jingyun

    2017-04-01

    Storage is the key link of modern logistics. Warehouse environment monitoring is an important part of storage safety management. To meet the storage requirements of different materials, guarantee their quality in the greatest extent, which has great significance. In the warehouse environment monitoring, the most important parameters are air temperature and relative humidity. In this paper, a design of warehouse multipoint temperature and humidity monitoring system based on King view, which realizes the multipoint temperature and humidity data real-time acquisition, monitoring and storage in warehouse by using temperature and humidity sensor. Also, this paper will take the bulk grain warehouse as an example and based on the data collected in real-time monitoring, giving the corresponding expert advice that combined with the corresponding algorithm, providing theoretical guidance to control the temperature and humidity in grain warehouse.

  19. Data Warehouse Governance Programs in Healthcare Settings: A Literature Review and a Call to Action

    PubMed Central

    Elliott, Thomas E.; Holmes, John H.; Davidson, Arthur J.; La Chance, Pierre-Andre; Nelson, Andrew F.; Steiner, John F.

    2013-01-01

    Purpose: Given the extensive data stored in healthcare data warehouses, data warehouse governance policies are needed to ensure data integrity and privacy. This review examines the current state of the data warehouse governance literature as it applies to healthcare data warehouses, identifies knowledge gaps, provides recommendations, and suggests approaches for further research. Methods: A comprehensive literature search using five data bases, journal article title-search, and citation searches was conducted between 1997 and 2012. Data warehouse governance documents from two healthcare systems in the USA were also reviewed. A modified version of nine components from the Data Governance Institute Framework for data warehouse governance guided the qualitative analysis. Results: Fifteen articles were retrieved. Only three were related to healthcare settings, each of which addressed only one of the nine framework components. Of the remaining 12 articles, 10 addressed between one and seven framework components and the remainder addressed none. Each of the two data warehouse governance plans obtained from healthcare systems in the USA addressed a subset of the framework components, and between them they covered all nine. Conclusions: While published data warehouse governance policies are rare, the 15 articles and two healthcare organizational documents reviewed in this study may provide guidance to creating such policies. Additional research is needed in this area to ensure that data warehouse governance polices are feasible and effective. The gap between the development of data warehouses in healthcare settings and formal governance policies is substantial, as evidenced by the sparse literature in this domain. PMID:25848561

  20. 27 CFR 24.141 - Bonded wine warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonded wine warehouse. 24..., DEPARTMENT OF THE TREASURY LIQUORS WINE Establishment and Operations Permanent Discontinuance of Operations § 24.141 Bonded wine warehouse. Where all operations at a bonded wine warehouse are to be permanently...

  1. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 7 2013-01-01 2013-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  2. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 7 2014-01-01 2014-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  3. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  4. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 7 2012-01-01 2012-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  5. 7 CFR 735.401 - Electronic warehouse receipt and USWA electronic document providers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Electronic warehouse receipt and USWA electronic... UNITED STATES WAREHOUSE ACT Electronic Providers § 735.401 Electronic warehouse receipt and USWA electronic document providers. (a) To establish a USWA-authorized system to issue and transfer EWR's and USWA...

  6. 7 CFR 735.302 - Paper warehouse receipts.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Paper warehouse receipts. 735.302 Section 735.302... § 735.302 Paper warehouse receipts. Paper warehouse receipts must be issued as follows: (a) On distinctive paper specified by DACO; (b) Printed by a printer authorized by DACO; and (c) Issued, identified...

  7. 7 CFR 735.302 - Paper warehouse receipts.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Paper warehouse receipts. 735.302 Section 735.302... § 735.302 Paper warehouse receipts. Paper warehouse receipts must be issued as follows: (a) On distinctive paper specified by DACO; (b) Printed by a printer authorized by DACO; and (c) Issued, identified...

  8. 19 CFR 144.27 - Withdrawal from warehouse by transferee.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ...; DEPARTMENT OF THE TREASURY (CONTINUED) WAREHOUSE AND REWAREHOUSE ENTRIES AND WITHDRAWALS Transfer of Right To Withdraw Merchandise from Warehouse § 144.27 Withdrawal from warehouse by transferee. At any time within... withdraw all or part of the merchandise covered by the transfer by filing any authorized kind of withdrawal...

  9. 19 CFR 144.27 - Withdrawal from warehouse by transferee.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...; DEPARTMENT OF THE TREASURY (CONTINUED) WAREHOUSE AND REWAREHOUSE ENTRIES AND WITHDRAWALS Transfer of Right To Withdraw Merchandise from Warehouse § 144.27 Withdrawal from warehouse by transferee. At any time within... withdraw all or part of the merchandise covered by the transfer by filing any authorized kind of withdrawal...

  10. What Academia Can Gain from Building a Data Warehouse.

    ERIC Educational Resources Information Center

    Wierschem, David; McMillen, Jeremy; McBroom, Randy

    2003-01-01

    Describes how, when used effectively, data warehouses can be a significant component of strategic decision making on campus. Discusses what a data warehouse is and what its informational contents may include, environmental drivers and obstacles, and strategies to justify developing a data warehouse for an academic institution. (EV)

  11. The development of health care data warehouses to support data mining.

    PubMed

    Lyman, Jason A; Scully, Kenneth; Harrison, James H

    2008-03-01

    Clinical data warehouses offer tremendous benefits as a foundation for data mining. By serving as a source for comprehensive clinical and demographic information on large patient populations, they streamline knowledge discovery efforts by providing standard and efficient mechanisms to replace time-consuming and expensive original data collection, organization, and processing. Building effective data warehouses requires knowledge of and attention to key issues in database design, data acquisition and processing, and data access and security. In this article, the authors provide an operational and technical definition of data warehouses, present examples of data mining projects enabled by existing data warehouses, and describe key issues and challenges related to warehouse development and implementation.

  12. Three Tier-Level Architecture Data Warehouse Design of Civil Servant Data in Minahasa Regency

    NASA Astrophysics Data System (ADS)

    Tangkawarow, I. R. H. T.; Runtuwene, J. P. A.; Sangkop, F. I.; Ngantung, L. V. F.

    2018-02-01

    Minahasa Regency is one of the regencies in North Sulawesi Province. In running the government in Minahasa Regency, a Regent is assisted by more than 6000 people Civil Servants (PNS) scattered in 60 SKPD. Badan Kepegawaian Diklat Daerah (BKDD) of Minahasa Regency is SKPD that performs data processing of all civil servants and is responsible for arranging and formatting civil servants. In the process of arranging and determining the formation of civil servants, many obstacles faced by BKDD. One of the obstacles is the unavailability of accurate data about the amount of educational background of civil servants based on rank/class, age, length of service, department, and so forth. The way to overcome the availability of data quickly and accurately is to do Business analytical. This process can be done by designing the data warehouse first. The design of data warehouse will be done by dividing it into three tiers of level.

  13. A Fast Healthcare Interoperability Resources (FHIR) layer implemented over i2b2.

    PubMed

    Boussadi, Abdelali; Zapletal, Eric

    2017-08-14

    Standards and technical specifications have been developed to define how the information contained in Electronic Health Records (EHRs) should be structured, semantically described, and communicated. Current trends rely on differentiating the representation of data instances from the definition of clinical information models. The dual model approach, which combines a reference model (RM) and a clinical information model (CIM), sets in practice this software design pattern. The most recent initiative, proposed by HL7, is called Fast Health Interoperability Resources (FHIR). The aim of our study was to investigate the feasibility of applying the FHIR standard to modeling and exposing EHR data of the Georges Pompidou European Hospital (HEGP) integrating biology and the bedside (i2b2) clinical data warehouse (CDW). We implemented a FHIR server over i2b2 to expose EHR data in relation with five FHIR resources: DiagnosisReport, MedicationOrder, Patient, Encounter, and Medication. The architecture of the server combines a Data Access Object design pattern and FHIR resource providers, implemented using the Java HAPI FHIR API. Two types of queries were tested: query type #1 requests the server to display DiagnosticReport resources, for which the diagnosis code is equal to a given ICD-10 code. A total of 80 DiagnosticReport resources, corresponding to 36 patients, were displayed. Query type #2, requests the server to display MedicationOrder, for which the FHIR Medication identification code is equal to a given code expressed in a French coding system. A total of 503 MedicationOrder resources, corresponding to 290 patients, were displayed. Results were validated by manually comparing the results of each request to the results displayed by an ad-hoc SQL query. We showed the feasibility of implementing a Java layer over the i2b2 database model to expose data of the CDW as a set of FHIR resources. An important part of this work was the structural and semantic mapping between the i2b2 model and the FHIR RM. To accomplish this, developers must manually browse the specifications of the FHIR standard. Our source code is freely available and can be adapted for use in other i2b2 sites.

  14. 27 CFR 28.28 - Withdrawal of wine and distilled spirits from customs bonded warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Withdrawal of wine and... Miscellaneous Provisions Customs Bonded Warehouses § 28.28 Withdrawal of wine and distilled spirits from customs bonded warehouses. Wine and bottled distilled spirits entered into customs bonded warehouses as provided...

  15. ADM. Warehouse (TAN604) Floor plan. General warehouse and chemical storage. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    ADM. Warehouse (TAN-604) Floor plan. General warehouse and chemical storage. Ralph M. Parsons 902-2-ANP-604-A 55. Date: December 1952. Approved by INEEL Classification Office for public release. INEEL index code no. 035-0604-00-693-106727 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID

  16. 7 CFR 735.106 - Excess storage and transferring of agricultural products.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Excess storage and transferring of agricultural... WAREHOUSE ACT Warehouse Licensing § 735.106 Excess storage and transferring of agricultural products. (a) If at any time a warehouse operator stores an agricultural product in a warehouse subject to a license...

  17. 7 CFR 735.106 - Excess storage and transferring of agricultural products.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Excess storage and transferring of agricultural... WAREHOUSE ACT Warehouse Licensing § 735.106 Excess storage and transferring of agricultural products. (a) If at any time a warehouse operator stores an agricultural product in a warehouse subject to a license...

  18. Ontological Approach to Military Knowledge Modeling and Management

    DTIC Science & Technology

    2004-03-01

    federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries

  19. Taking advantage of continuity of care documents to populate a research repository.

    PubMed

    Klann, Jeffrey G; Mendis, Michael; Phillips, Lori C; Goodson, Alyssa P; Rocha, Beatriz H; Goldberg, Howard S; Wattanasin, Nich; Murphy, Shawn N

    2015-03-01

    Clinical data warehouses have accelerated clinical research, but even with available open source tools, there is a high barrier to entry due to the complexity of normalizing and importing data. The Office of the National Coordinator for Health Information Technology's Meaningful Use Incentive Program now requires that electronic health record systems produce standardized consolidated clinical document architecture (C-CDA) documents. Here, we leverage this data source to create a low volume standards based import pipeline for the Informatics for Integrating Biology and the Bedside (i2b2) clinical research platform. We validate this approach by creating a small repository at Partners Healthcare automatically from C-CDA documents. We designed an i2b2 extension to import C-CDAs into i2b2. It is extensible to other sites with variances in C-CDA format without requiring custom code. We also designed new ontology structures for querying the imported data. We implemented our methodology at Partners Healthcare, where we developed an adapter to retrieve C-CDAs from Enterprise Services. Our current implementation supports demographics, encounters, problems, and medications. We imported approximately 17 000 clinical observations on 145 patients into i2b2 in about 24 min. We were able to perform i2b2 cohort finding queries and view patient information through SMART apps on the imported data. This low volume import approach can serve small practices with local access to C-CDAs and will allow patient registries to import patient supplied C-CDAs. These components will soon be available open source on the i2b2 wiki. Our approach will lower barriers to entry in implementing i2b2 where informatics expertise or data access are limited. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  1. 21 CFR 203.3 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... acutely ill or injured persons; provision of minimal emergency supplies of drugs to nearby nursing homes...' and distributors' warehouses, chain drug warehouses, and wholesale drug warehouses; independent...

  2. 21 CFR 203.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... acutely ill or injured persons; provision of minimal emergency supplies of drugs to nearby nursing homes...' and distributors' warehouses, chain drug warehouses, and wholesale drug warehouses; independent...

  3. 21 CFR 203.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... acutely ill or injured persons; provision of minimal emergency supplies of drugs to nearby nursing homes...' and distributors' warehouses, chain drug warehouses, and wholesale drug warehouses; independent...

  4. 8. Freight Warehouse, looking east into the east section. Projecting ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    8. Freight Warehouse, looking east into the east section. Projecting into the warehouse space are: (a) A loading dock with slatted walls (b) An office space above it and (c) A toilet and utility area. (b) and (c) are accessed through the Ticket Office. - Curtis Wharf, Freight Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  5. The Data Warehouse: Keeping It Simple. MIT Shares Valuable Lessons Learned from a Successful Data Warehouse Implementation.

    ERIC Educational Resources Information Center

    Thorne, Scott

    2000-01-01

    Explains why the data warehouse is important to the Massachusetts Institute of Technology community, describing its basic functions and technical design points; sharing some non-technical aspects of the school's data warehouse implementation that have proved to be important; examining the importance of proper training in a successful warehouse…

  6. 27 CFR 19.134 - Bonded warehouses not on premises qualified for production of spirits.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... depositors of spirits; (iii) Approximate number of persons to be served from the warehouse; and (iv) Data or... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonded warehouses not on... Location and Use § 19.134 Bonded warehouses not on premises qualified for production of spirits. (a...

  7. Achieving continuous improvement in laboratory organization through performance measurements: a seven-year experience.

    PubMed

    Salinas, Maria; López-Garrigós, Maite; Gutiérrez, Mercedes; Lugo, Javier; Sirvent, Jose Vicente; Uris, Joaquin

    2010-01-01

    Laboratory performance can be measured using a set of model key performance indicators (KPIs). The design and implementation of KPIs are important issues. KPI results from 7 years are reported and their implementation, monitoring, objectives, interventions, result reporting and delivery are analyzed. The KPIs of the entire laboratory process were obtained using Laboratory Information System (LIS) registers. These were collected automatically using a data warehouse application, spreadsheets and external quality program reports. Customer satisfaction was assessed using surveys. Nine model laboratory KPIs were proposed and measured. The results of some examples of KPIs used in our laboratory are reported. Their corrective measurements or the implementation of objectives led to improvement in the associated KPIs results. Measurement of laboratory performance using KPIs and a data warehouse application that continuously collects registers and calculates KPIs confirmed the reliability of indicators, indicator acceptability and usability for users, and continuous process improvement.

  8. [Construction and realization of real world integrated data warehouse from HIS on re-evaluation of post-maketing traditional Chinese medicine].

    PubMed

    Zhuang, Yan; Xie, Bangtie; Weng, Shengxin; Xie, Yanming

    2011-10-01

    To construct real world integrated data warehouse on re-evaluation of post-marketing traditional Chinese medicine for the research on key techniques of clinic re-evaluation which mainly includes indication of traditional Chinese medicine, dosage usage, course of treatment, unit medication, combined disease and adverse reaction, which provides data for reviewed research on its safety,availability and economy,and provides foundation for perspective research. The integrated data warehouse extracts and integrate data from HIS by information collection system and data warehouse technique and forms standard structure and data. The further research is on process based on the data. A data warehouse and several sub data warehouses were built, which focused on patients' main records, doctor orders, diseases diagnoses, laboratory results and economic indications in hospital. These data warehouses can provide research data for re-evaluation of post-marketing traditional Chinese medicine, and it has clinical value. Besides, it points out the direction for further research.

  9. Design and applications of a multimodality image data warehouse framework.

    PubMed

    Wong, Stephen T C; Hoo, Kent Soo; Knowlton, Robert C; Laxer, Kenneth D; Cao, Xinhau; Hawkins, Randall A; Dillon, William P; Arenson, Ronald L

    2002-01-01

    A comprehensive data warehouse framework is needed, which encompasses imaging and non-imaging information in supporting disease management and research. The authors propose such a framework, describe general design principles and system architecture, and illustrate a multimodality neuroimaging data warehouse system implemented for clinical epilepsy research. The data warehouse system is built on top of a picture archiving and communication system (PACS) environment and applies an iterative object-oriented analysis and design (OOAD) approach and recognized data interface and design standards. The implementation is based on a Java CORBA (Common Object Request Broker Architecture) and Web-based architecture that separates the graphical user interface presentation, data warehouse business services, data staging area, and backend source systems into distinct software layers. To illustrate the practicality of the data warehouse system, the authors describe two distinct biomedical applications--namely, clinical diagnostic workup of multimodality neuroimaging cases and research data analysis and decision threshold on seizure foci lateralization. The image data warehouse framework can be modified and generalized for new application domains.

  10. Design and Applications of a Multimodality Image Data Warehouse Framework

    PubMed Central

    Wong, Stephen T.C.; Hoo, Kent Soo; Knowlton, Robert C.; Laxer, Kenneth D.; Cao, Xinhau; Hawkins, Randall A.; Dillon, William P.; Arenson, Ronald L.

    2002-01-01

    A comprehensive data warehouse framework is needed, which encompasses imaging and non-imaging information in supporting disease management and research. The authors propose such a framework, describe general design principles and system architecture, and illustrate a multimodality neuroimaging data warehouse system implemented for clinical epilepsy research. The data warehouse system is built on top of a picture archiving and communication system (PACS) environment and applies an iterative object-oriented analysis and design (OOAD) approach and recognized data interface and design standards. The implementation is based on a Java CORBA (Common Object Request Broker Architecture) and Web-based architecture that separates the graphical user interface presentation, data warehouse business services, data staging area, and backend source systems into distinct software layers. To illustrate the practicality of the data warehouse system, the authors describe two distinct biomedical applications—namely, clinical diagnostic workup of multimodality neuroimaging cases and research data analysis and decision threshold on seizure foci lateralization. The image data warehouse framework can be modified and generalized for new application domains. PMID:11971885

  11. 15. FIRST FLOOR WAREHOUSE SPACE, SHOWING COLUMN / BEAM CONNECTION. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    15. FIRST FLOOR WAREHOUSE SPACE, SHOWING COLUMN / BEAM CONNECTION. VIEW TO SOUTHWEST. - Commercial & Industrial Buildings, Dubuque Seed Company Warehouse, 169-171 Iowa Street, Dubuque, Dubuque County, IA

  12. Dipstick screening for urinary tract infection in febrile infants.

    PubMed

    Glissmeyer, Eric W; Korgenski, E Kent; Wilkes, Jacob; Schunk, Jeff E; Sheng, Xiaoming; Blaschke, Anne J; Byington, Carrie L

    2014-05-01

    This study compares the performance of urine dipstick alone with urine microscopy and with both tests combined as a screen for urinary tract infection (UTI) in febrile infants aged 1 to 90 days. We queried the Intermountain Healthcare data warehouse to identify febrile infants with urine dipstick, microscopy, and culture performed between 2004 and 2011. UTI was defined as >50 000 colony-forming units per milliliter of a urinary pathogen. We compared the performance of urine dipstick with unstained microscopy or both tests combined ("combined urinalysis") to identify UTI in infants aged 1 to 90 days. Of 13 030 febrile infants identified, 6394 (49%) had all tests performed and were included in the analysis. Of these, 770 (12%) had UTI. Urine culture results were positive within 24 hours in 83% of UTIs. The negative predictive value (NPV) was >98% for all tests. The combined urinalysis NPV was 99.2% (95% confidence interval: 99.1%-99.3%) and was significantly greater than the dipstick NPV of 98.7% (98.6%-98.8%). The dipstick positive predictive value was significantly greater than combined urinalysis (66.8% [66.2%-67.4%] vs 51.2% [50.6%-51.8%]). These data suggest 8 febrile infants would be predicted to have a false-positive combined urinalysis for every 1 infant with UTI initially missed by dipstick screening. Urine dipstick testing compares favorably with both microscopy and combined urinalysis in febrile infants aged 1 to 90 days. The urine dipstick test may be an adequate stand-alone screen for UTI in febrile infants while awaiting urine culture results. Copyright © 2014 by the American Academy of Pediatrics.

  13. 10. WEST FRONT AND SOUTH SIDE OF WAREHOUSE. VIEW TO ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    10. WEST FRONT AND SOUTH SIDE OF WAREHOUSE. VIEW TO NORTH. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  14. 17. SECOND FLOOR WAREHOUSE SPACE, SHOWING COLUMN AND BEAM CONNECTION. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    17. SECOND FLOOR WAREHOUSE SPACE, SHOWING COLUMN AND BEAM CONNECTION. VIEW TO NORTHEAST. - Commercial & Industrial Buildings, Dubuque Seed Company Warehouse, 169-171 Iowa Street, Dubuque, Dubuque County, IA

  15. 19. DETAIL OF FIRST FLOOR WAREHOUSE, SHOWING ROOF TRUSS. VIEW ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    19. DETAIL OF FIRST FLOOR WAREHOUSE, SHOWING ROOF TRUSS. VIEW TO EAST. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  16. HodDB: Design and Analysis of a Query Processor for Brick.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fierro, Gabriel; Culler, David

    Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less

  17. 11. SOUTH SIDE OF WAREHOUSE, WITH LOADING DOCK IN FOREGROUND. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    11. SOUTH SIDE OF WAREHOUSE, WITH LOADING DOCK IN FOREGROUND. VIEW TO NORTHWEST. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  18. 6. Cement and Plaster Warehouse, interior. View looking south. Original ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. Cement and Plaster Warehouse, interior. View looking south. Original wood roof truss can be seen at upper left. - Curtis Wharf, Cement & Plaster Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  19. 4. Cement and Plaster Warehouse, southeast corner, showing alterations; pent ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    4. Cement and Plaster Warehouse, southeast corner, showing alterations; pent roof, window and door openings, siding, brick foundation sheathing. - Curtis Wharf, Cement & Plaster Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  20. 3. Cement and Plaster Warehouse, north facade. Loading ramp on ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Cement and Plaster Warehouse, north facade. Loading ramp on the right. Utility building, intrusion, on the far right. - Curtis Wharf, Cement & Plaster Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  1. 9. INTERIOR, WAREHOUSE SPACE AT EAST END OF BUILDING, CAMERA ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    9. INTERIOR, WAREHOUSE SPACE AT EAST END OF BUILDING, CAMERA FACING NORTHEAST. - U.S. Coast Guard Support Center Alameda, Warehouse, Spencer Road & Icarrus Drive, Coast Guard Island, Alameda, Alameda County, CA

  2. View of steel warehouses, building 710 north sidewalk; camera facing ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses, building 710 north sidewalk; camera facing east. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  3. Roogle: an information retrieval engine for clinical data warehouse.

    PubMed

    Cuggia, Marc; Garcelon, Nicolas; Campillo-Gimenez, Boris; Bernicot, Thomas; Laurent, Jean-François; Garin, Etienne; Happe, André; Duvauferrier, Régis

    2011-01-01

    High amount of relevant information is contained in reports stored in the electronic patient records and associated metadata. R-oogle is a project aiming at developing information retrieval engines adapted to these reports and designed for clinicians. The system consists in a data warehouse (full-text reports and structured data) imported from two different hospital information systems. Information retrieval is performed using metadata-based semantic and full-text search methods (as Google). Applications may be biomarkers identification in a translational approach, search of specific cases, and constitution of cohorts, professional practice evaluation, and quality control assessment.

  4. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    NASA Astrophysics Data System (ADS)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  5. 15. Photocopy of photograph (Original print, Wallie V. Funk Collection.) ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    15. Photocopy of photograph (Original print, Wallie V. Funk Collection.) Photographer unknown. Published in the Anacortes American, 12 October 1911; 'Plant of the Anacortes Ice Company and Curtis Dock.' Photograph probably earlier. View looking northwest, left to right; Cement and Plaster Warehouse; Ice Plant (with towers); Cold Storage Warehouse; Freight Warehouse; a warehouse; and early ticket office. - Curtis Wharf, O & Second Streets, Anacortes, Skagit County, WA

  6. 12. EAST REAR OF OFFICE BUILDING (RIGHT FOREGROUND) AND WAREHOUSE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    12. EAST REAR OF OFFICE BUILDING (RIGHT FOREGROUND) AND WAREHOUSE (LEFT BACKGROUND). VIEW TO SOUTH. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  7. 2. View looking northeast at Dixie Cotton Mill warehouses. Note ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    2. View looking northeast at Dixie Cotton Mill warehouses. Note firestops between sections of the building to prevent fire from spreading. - Dixie Cotton Mill, Warehouses, 710 Greenville Street, La Grange, Troup County, GA

  8. View of steel warehouses (building 710 second in on right); ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses (building 710 second in on right); camera facing south. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  9. View of steel warehouses (building 710 second in on left); ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses (building 710 second in on left); camera facing west. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  10. An efficiency improvement in warehouse operation using simulation analysis

    NASA Astrophysics Data System (ADS)

    Samattapapong, N.

    2017-11-01

    In general, industry requires an efficient system for warehouse operation. There are many important factors that must be considered when designing an efficient warehouse system. The most important is an effective warehouse operation system that can help transfer raw material, reduce costs and support transportation. By all these factors, researchers are interested in studying about work systems and warehouse distribution. We start by collecting the important data for storage, such as the information on products, information on size and location, information on data collection and information on production, and all this information to build simulation model in Flexsim® simulation software. The result for simulation analysis found that the conveyor belt was a bottleneck in the warehouse operation. Therefore, many scenarios to improve that problem were generated and testing through simulation analysis process. The result showed that an average queuing time was reduced from 89.8% to 48.7% and the ability in transporting the product increased from 10.2% to 50.9%. Thus, it can be stated that this is the best method for increasing efficiency in the warehouse operation.

  11. 7 CFR 250.24 - Distributing agency performance standards.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... commodity assistance levels; (4) Ordering and allocating donated food based on participation data for those... directly into their warehouses; (2) Solicit information and recommendations regarding the individual...

  12. 7 CFR 250.24 - Distributing agency performance standards.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... commodity assistance levels; (4) Ordering and allocating donated food based on participation data for those... directly into their warehouses; (2) Solicit information and recommendations regarding the individual...

  13. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  14. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  15. 6. GENERAL WAREHOUSE, VIEW TO EAST SHOWING DETAIL OF ELEVATOR ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. GENERAL WAREHOUSE, VIEW TO EAST SHOWING DETAIL OF ELEVATOR DOOR (CENTER), FLANKING PEDESTRIAN ENTRIES, AND LOADING DOCK. - Rosie the Riveter National Historical Park, General Warehouse, 1320 Canal Boulevard, Richmond, Contra Costa County, CA

  16. View of steel warehouses at Gilmore Avenue (building 710 second ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses at Gilmore Avenue (building 710 second in on left); camera facing east. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  17. View of steel warehouses on Ellsberg Drive, building 710 full ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses on Ellsberg Drive, building 710 full building at center; camera facing southeast. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  18. View of steel warehouses (from left: building 807, 808, 809, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View of steel warehouses (from left: building 807, 808, 809, 810, 811); camera facing east. - Naval Supply Annex Stockton, Steel Warehouse Type, Between James & Humphreys Drives south of Embarcadero, Stockton, San Joaquin County, CA

  19. Building a Data Warehouse.

    ERIC Educational Resources Information Center

    Levine, Elliott

    2002-01-01

    Describes how to build a data warehouse, using the Schools Interoperability Framework (www.sifinfo.org), that supports data-driven decision making and complies with the Freedom of Information Act. Provides several suggestions for building and maintaining a data warehouse. (PKP)

  20. The optimal retailer's ordering policies with trade credit financing and limited storage capacity in the supply chain system

    NASA Astrophysics Data System (ADS)

    Yen, Ghi-Feng; Chung, Kun-Jen; Chen, Tzung-Ching

    2012-11-01

    The traditional economic order quantity model assumes that the retailer's storage capacity is unlimited. However, as we all know, the capacity of any warehouse is limited. In practice, there usually exist various factors that induce the decision-maker of the inventory system to order more items than can be held in his/her own warehouse. Therefore, for the decision-maker, it is very practical to determine whether or not to rent other warehouses. In this article, we try to incorporate two levels of trade credit and two separate warehouses (own warehouse and rented warehouse) to establish a new inventory model to help the decision-maker to make the decision. Four theorems are provided to determine the optimal cycle time to generalise some existing articles. Finally, the sensitivity analysis is executed to investigate the effects of the various parameters on ordering policies and annual costs of the inventory system.

  1. Diabetes and Hypertension Quality Measurement in Four Safety-Net Sites

    PubMed Central

    Benkert, R.; Dennehy, P.; White, J.; Hamilton, A.; Tanner, C.

    2014-01-01

    Summary Background In this new era after the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009, the literature on lessons learned with electronic health record (EHR) implementation needs to be revisited. Objectives Our objective was to describe what implementation of a commercially available EHR with built-in quality query algorithms showed us about our care for diabetes and hypertension populations in four safety net clinics, specifically feasibility of data retrieval, measurements over time, quality of data, and how our teams used this data. Methods A cross-sectional study was conducted from October 2008 to October 2012 in four safety-net clinics located in the Midwest and Western United States. A data warehouse that stores data from across the U.S was utilized for data extraction from patients with diabetes or hypertension diagnoses and at least two office visits per year. Standard quality measures were collected over a period of two to four years. All sites were engaged in a partnership model with the IT staff and a shared learning process to enhance the use of the quality metrics. Results While use of the algorithms was feasible across sites, challenges occurred when attempting to use the query results for research purposes. There was wide variation of both process and outcome results by individual centers. Composite calculations balanced out the differences seen in the individual measures. Despite using consistent quality definitions, the differences across centers had an impact on numerators and denominators. All sites agreed to a partnership model of EHR implementation, and each center utilized the available resources of the partnership for Center-specific quality initiatives. Conclusions Utilizing a shared EHR, a Regional Extension Center-like partnership model, and similar quality query algorithms allowed safety-net clinics to benchmark and improve the quality of care across differing patient populations and health care delivery models. PMID:25298815

  2. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  3. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  4. Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

    Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less

  5. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

    PubMed

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2013-11-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

  6. Power Distribution Analysis For Electrical Usage In Province Area Using Olap (Online Analytical Processing)

    NASA Astrophysics Data System (ADS)

    Samsinar, Riza; Suseno, Jatmiko Endro; Widodo, Catur Edi

    2018-02-01

    The distribution network is the closest power grid to the customer Electric service providers such as PT. PLN. The dispatching center of power grid companies is also the data center of the power grid where gathers great amount of operating information. The valuable information contained in these data means a lot for power grid operating management. The technique of data warehousing online analytical processing has been used to manage and analysis the great capacity of data. Specific methods for online analytics information systems resulting from data warehouse processing with OLAP are chart and query reporting. The information in the form of chart reporting consists of the load distribution chart based on the repetition of time, distribution chart on the area, the substation region chart and the electric load usage chart. The results of the OLAP process show the development of electric load distribution, as well as the analysis of information on the load of electric power consumption and become an alternative in presenting information related to peak load.

  7. Temporal and Location Based RFID Event Data Management and Processing

    NASA Astrophysics Data System (ADS)

    Wang, Fusheng; Liu, Peiya

    Advance of sensor and RFID technology provides significant new power for humans to sense, understand and manage the world. RFID provides fast data collection with precise identification of objects with unique IDs without line of sight, thus it can be used for identifying, locating, tracking and monitoring physical objects. Despite these benefits, RFID poses many challenges for data processing and management. RFID data are temporal and history oriented, multi-dimensional, and carrying implicit semantics. Moreover, RFID applications are heterogeneous. RFID data management or data warehouse systems need to support generic and expressive data modeling for tracking and monitoring physical objects, and provide automated data interpretation and processing. We develop a powerful temporal and location oriented data model for modeling and queryingRFID data, and a declarative event and rule based framework for automated complex RFID event processing. The approach is general and can be easily adapted for different RFID-enabled applications, thus significantly reduces the cost of RFID data integration.

  8. A biomedical information system for retrieval and manipulation of NHANES data.

    PubMed

    Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A

    2013-01-01

    The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.

  9. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  10. 11. AFRD WAREHOUSE, INTERIOR DETAIL OF RAFTER SUPPORT POST TIMBER ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    11. AFRD WAREHOUSE, INTERIOR DETAIL OF RAFTER SUPPORT POST TIMBER AND METHOD OF BRACING. THE BRACES PENETRATE THE SHEET ROCK, SUGGESTING THAT THESE ARE ORIGINAL. - Minidoka Relocation Center Warehouse, 111 South Fir Street, Shoshone, Lincoln County, ID

  11. 1. Cold Storage Warehouse, east facade. Northeast corner of the ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    1. Cold Storage Warehouse, east facade. Northeast corner of the north facade of the Ice Plant is visible on the left. Far left, the Creamery. - Curtis Wharf, Cold Storage Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  12. 14. DETAIL OF SOUTHWEST FRONT OF WAREHOUSE, SHOWING CORRUGATED PLASTER/ASBESTOS ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    14. DETAIL OF SOUTHWEST FRONT OF WAREHOUSE, SHOWING CORRUGATED PLASTER/ASBESTOS WALLS, WINDOWS AND ROOF. VIEW TO NORTHEAST. - Commercial & Industrial Buildings, International Harvester Company Showroom, Office & Warehouse, 10 South Main Street, Dubuque, Dubuque County, IA

  13. 17 CFR 31.12 - Segregation.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... United States, or unencumbered warehouse receipts for inventory held in approved contract market..., and proceeds from any sale, liquidation or other disposition of obligations or warehouse receipts... to purchase obligations or warehouse receipts of the type described in this paragraph (b) shall...

  14. 17 CFR 31.12 - Segregation.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... United States, or unencumbered warehouse receipts for inventory held in approved contract market..., and proceeds from any sale, liquidation or other disposition of obligations or warehouse receipts... to purchase obligations or warehouse receipts of the type described in this paragraph (b) shall...

  15. ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.

    PubMed

    Krishnakumar, Vivek; Contrino, Sergio; Cheng, Chia-Yi; Belyaeva, Irina; Ferlanti, Erik S; Miller, Jason R; Vaughn, Matthew W; Micklem, Gos; Town, Christopher D; Chan, Agnes P

    2017-01-01

    ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  16. Development of Product Availability Monitoring System In Production Unit In Automotive Component Industry

    NASA Astrophysics Data System (ADS)

    Hartono, Rachmad; Raharno, Sri; Yuwana Martawirya, Yatna; Arthaya, Bagus

    2018-03-01

    This paper described a methodology to monitor the availability of products in a production unit in the automotive component industry. Automotive components made are automotive components made through sheet metal working. Raw material coming into production unit in the form of pieces of plates that have a certain size. Raw materials that come stored in the warehouse. Data of raw each material in the warehouse are recorded and stored in a data base system. The material will then undergo several production processes in the production unit. When the material is taken from the warehouse, material data are also recorded and stored in a data base. The data recorded are the amount of material, material type, and date when the material is out of the warehouse. The material coming out of the warehouse is labeled with information related to the production processes that the material must pass. Material out of the warehouse is a product will be made. The products have been completed, are stored in the warehouse products. When the product is entered into the product warehouse, product data is also recorded by scanning the barcode contained on the label. By recording the condition of the product at each stage of production, we can know the availability of the product in a production unit in the form of a raw material, the product being processed and the finished product.

  17. Report: Early Warning Report: Main EPA Headquarters Warehouse in Landover, Maryland, Requires Immediate EPA Attention

    EPA Pesticide Factsheets

    Report #13-P-0272, May 31, 2013. Our initial research at the EPA’s Landover warehouse raised significant concerns with the lack of agency oversight of personal property and warehouse space at the facility.

  18. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  19. Personalized query suggestion based on user behavior

    NASA Astrophysics Data System (ADS)

    Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

    Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

  20. Two-warehouse partial backlogging inventory model for deteriorating items with linear trend in demand under inflationary conditions

    NASA Astrophysics Data System (ADS)

    Jaggi, Chandra K.; Khanna, Aditi; Verma, Priyanka

    2011-07-01

    In today's business transactions, there are various reasons, namely, bulk purchase discounts, re-ordering costs, seasonality of products, inflation induced demand, etc., which force the buyer to order more than the warehouse capacity. Such situations call for additional storage space to store the excess units purchased. This additional storage space is typically a rented warehouse. Inflation plays a very interesting and significant role here: It increases the cost of goods. To safeguard from the rising prices, during the inflation regime, the organisation prefers to keep a higher inventory, thereby increasing the aggregate demand. This additional inventory needs additional storage space, which is facilitated by a rented warehouse. Ignoring the effects of the time value of money and inflation might yield misleading results. In this study, a two-warehouse inventory model with linear trend in demand under inflationary conditions having different rates of deterioration has been developed. Shortages at the owned warehouse are also allowed subject to partial backlogging. The solution methodology provided in the model helps to decide on the feasibility of renting a warehouse. Finally, findings have been illustrated with the help of numerical examples. Comprehensive sensitivity analysis has also been provided.

  1. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  2. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  3. Credit BG. View looks southwest (236°) at the warehouse's southeast ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Credit BG. View looks southwest (236°) at the warehouse's southeast and northeast facades. This building retains its original World War II era materials and appearance - Edwards Air Force Base, North Base, Warehouse, Second & C Streets, Boron, Kern County, CA

  4. 76 FR 28801 - Agency Information Collection Activities: Bonded Warehouse Regulations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-18

    ... Activities: Bonded Warehouse Regulations AGENCY: U.S. Customs and Border Protection, Department of Homeland... (OMB) for review and approval in accordance with the Paperwork Reduction Act: Bonded Warehouse... appropriate automated, electronic, mechanical, or other technological techniques or other forms of information...

  5. 3. Interior view of section of warehouse building now used ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Interior view of section of warehouse building now used as an opening room. 'Unifloc' machine manufactured by Rieter opens and blends cotton from several bales before processing. - Dixie Cotton Mill, Warehouses, 710 Greenville Street, La Grange, Troup County, GA

  6. SP2Bench: A SPARQL Performance Benchmark

    NASA Astrophysics Data System (ADS)

    Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

    A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.

  7. A study on building data warehouse of hospital information system.

    PubMed

    Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo

    2011-08-01

    Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

  8. Data Warehouse Discovery Framework: The Foundation

    NASA Astrophysics Data System (ADS)

    Apanowicz, Cas

    The cost of building an Enterprise Data Warehouse Environment runs usually in millions of dollars and takes years to complete. The cost, as big as it is, is not the primary problem for a given corporation. The risk that all money allocated for planning, design and implementation of the Data Warehouse and Business Intelligence Environment may not bring the result expected, fare out way the cost of entire effort [2,10]. The combination of the two above factors is the main reason that Data Warehouse/Business Intelligence is often single most expensive and most risky IT endeavor for companies [13]. That situation was the main author's inspiration behind founding of Infobright Corp and later on the concept of Data Warehouse Discovery Framework.

  9. KAMEDO report no. 82: explosion at the fireworks warehouse in the Netherlands in 2000.

    PubMed

    Björnhagen, Viveka; Messner, Torbjörn; Brändström, Helge

    2006-01-01

    A fire and subsequent explosions occurred in a fireworks warehouse on 13 May 2000. A total of 947 persons were injured and 21 persons died, including four firefighters and one reporter. Communication networks became overloaded and impaired notification chains. The hospital disaster plan was followed, but was proved inadequate. Public information was a high priority. A counselling center was established early and was planned to continue operation for five years. The command function did not perform to expectations. Hospital triage was impaired as many responsible left the triage area. Short-term psychosocial support evolved to long-term programs. Liability issues were examined.

  10. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

    PubMed Central

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2016-01-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325

  11. 4. AFRD WAREHOUSE, WEST SIDE DETAIL OF ALTERED SLIDING DOORS, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    4. AFRD WAREHOUSE, WEST SIDE DETAIL OF ALTERED SLIDING DOORS, FACING EAST. WEATHER COVER OVER RAIL IS ORIGINAL. SHEET METAL SIDING HAS BEEN INSERTED BETWEEN TWO HALVES OF SLIDING DOORS. - Minidoka Relocation Center Warehouse, 111 South Fir Street, Shoshone, Lincoln County, ID

  12. Envirofacts Data Warehouse

    EPA Pesticide Factsheets

    The Envirofacts Data Warehouse contains information from select EPA Environmental program office databases and provides access about environmental activities that may affect air, water, and land anywhere in the United States. The Envirofacts Warehouse supports its own web enabled tools as well as a host of other EPA applications.

  13. Information Architecture: The Data Warehouse Foundation.

    ERIC Educational Resources Information Center

    Thomas, Charles R.

    1997-01-01

    Colleges and universities are initiating data warehouse projects to provide integrated information for planning and reporting purposes. A survey of 40 institutions with active data warehouse projects reveals the kinds of tools, contents, data cycles, and access currently used. Essential elements of an integrated information architecture are…

  14. Optimal (R, Q) policy and pricing for two-echelon supply chain with lead time and retailer's service-level incomplete information

    NASA Astrophysics Data System (ADS)

    Esmaeili, M.; Naghavi, M. S.; Ghahghaei, A.

    2018-03-01

    Many studies focus on inventory systems to analyze different real-world situations. This paper considers a two-echelon supply chain that includes one warehouse and one retailer with stochastic demand and an up-to-level policy. The retailer's lead time includes the transportation time from the warehouse to the retailer that is unknown to the retailer. On the other hand, the warehouse is unaware of retailer's service level. The relationship between the retailer and the warehouse is modeled based on the Stackelberg game with incomplete information. Moreover, their relationship is presented when the warehouse and the retailer reveal their private information using the incentive strategies. The optimal inventory and pricing policies are obtained using an algorithm based on bi-level programming. Numerical examples, including sensitivity analysis of some key parameters, will compare the results between the Stackelberg models. The results show that information sharing is more beneficial to the warehouse rather than the retailer.

  15. Comparison of Left Ventricular Hypertrophy by Electrocardiography and Echocardiography in Children Using Analytics Tool.

    PubMed

    Tague, Lauren; Wiggs, Justin; Li, Qianxi; McCarter, Robert; Sherwin, Elizabeth; Weinberg, Jacqueline; Sable, Craig

    2018-05-17

    Left ventricular hypertrophy (LVH) is a common finding on pediatric electrocardiography (ECG) leading to many referrals for echocardiography (echo). This study utilizes a novel analytics tool that combines ECG and echo databases to evaluate ECG as a screening tool for LVH. SQL Server 2012 data warehouse incorporated ECG and echo databases for all patients from a single institution from 2006 to 2016. Customized queries identified patients 0-18 years old with LVH on ECG and an echo performed within 24 h. Using data visualization (Tableau) and analytic (Stata 14) software, ECG and echo findings were compared. Of 437,699 encounters, 4637 met inclusion criteria. ECG had high sensitivity (≥ 90%) but poor specificity (43%), and low positive predictive value (< 20%) for echo abnormalities. ECG performed only 11-22% better than chance (AROC = 0.50). 83% of subjects with LVH on ECG had normal left ventricle (LV) structure and size on echo. African-Americans with LVH were least likely to have an abnormal echo. There was a low correlation between V 6 R on ECG and echo-derived Z score of left ventricle diastolic diameter (r = 0.14) and LV mass index (r = 0.24). The data analytics client was able to mine a database of ECG and echo reports, comparing LVH by ECG and LV measurements and qualitative findings by echo, identifying an abnormal LV by echo in only 17% of cases with LVH on ECG. This novel tool is useful for rapid data mining for both clinical and research endeavors.

  16. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  17. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Warehouse-stored marketing assistance loan collateral... SIMILARLY HANDLED COMMODITIES-MARKETING ASSISTANCE LOANS AND LOAN DEFICIENCY PAYMENTS FOR 2008 THROUGH 2012 Marketing Assistance Loans § 1421.106 Warehouse-stored marketing assistance loan collateral. (a) A commodity...

  18. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Warehouse-stored marketing assistance loan collateral... SIMILARLY HANDLED COMMODITIES-MARKETING ASSISTANCE LOANS AND LOAN DEFICIENCY PAYMENTS FOR 2008 THROUGH 2012 Marketing Assistance Loans § 1421.106 Warehouse-stored marketing assistance loan collateral. (a) A commodity...

  19. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Warehouse-stored marketing assistance loan collateral... SIMILARLY HANDLED COMMODITIES-MARKETING ASSISTANCE LOANS AND LOAN DEFICIENCY PAYMENTS FOR 2008 THROUGH 2012 Marketing Assistance Loans § 1421.106 Warehouse-stored marketing assistance loan collateral. (a) A commodity...

  20. 77 FR 26024 - Agency Information Collection Activities: Bonded Warehouse Proprietor's Submission

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-02

    ... Activities: Bonded Warehouse Proprietor's Submission AGENCY: U.S. Customs and Border Protection, Department... Budget (OMB) for review and approval in accordance with the Paperwork Reduction Act: Bonded Warehouse... information on those who are to respond, including the use of appropriate automated, electronic, mechanical...

  1. 19 CFR 146.64 - Entry for warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... (CONTINUED) FOREIGN TRADE ZONES Transfer of Merchandise From a Zone § 146.64 Entry for warehouse. (a) Foreign... status may not be entered for warehouse from a zone. Merchandise in nonprivileged foreign status... different port. (b) Zone-restricted merchandise. Foreign merchandise in zone-restricted status may be...

  2. School District Evaluation: Database Warehouse Support.

    ERIC Educational Resources Information Center

    Adcock, Eugene P.; Haseltine, Reginald

    The Prince George's County (Maryland) school system has developed a database warehouse system as an evaluation data support tool for fulfilling the system's information demands. This paper described the Research and Evaluation Assimilation Database (READ) warehouse support system and considers the requirements for data used in evaluation and how…

  3. 7 CFR 29.2 - Policy statement.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... warehouses which are located beyond the geographical limitation for “designated markets” set forth in § 29.1... to new markets to warehouses which are located beyond the geographical limitation for “designated... services. The extension of tobacco inspection and price support services to new markets to warehouses which...

  4. 19 CFR 19.2 - Applications to bond.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... CUSTOMS WAREHOUSES, CONTAINER STATIONS AND CONTROL OF MERCHANDISE THEREIN General Provisions § 19.2 Applications to bond. (a) Application. An owner or lessee desiring to establish a bonded warehouse facility shall make written application to the director of the port nearest to where the warehouse is located...

  5. 19 CFR 19.2 - Applications to bond.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... CUSTOMS WAREHOUSES, CONTAINER STATIONS AND CONTROL OF MERCHANDISE THEREIN General Provisions § 19.2 Applications to bond. (a) Application. An owner or lessee desiring to establish a bonded warehouse facility shall make written application to the director of the port nearest to where the warehouse is located...

  6. 7 CFR 29.2 - Policy statement.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... warehouses which are located beyond the geographical limitation for “designated markets” set forth in § 29.1... to new markets to warehouses which are located beyond the geographical limitation for “designated... services. The extension of tobacco inspection and price support services to new markets to warehouses which...

  7. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea

    PubMed Central

    Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028

  8. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

    PubMed

    Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.

  9. 27 CFR 24.126 - Change in proprietorship involving a bonded wine warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... involving a bonded wine warehouse. 24.126 Section 24.126 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU, DEPARTMENT OF THE TREASURY LIQUORS WINE Establishment and Operations Changes Subsequent to Original Establishment § 24.126 Change in proprietorship involving a bonded wine warehouse...

  10. 27 CFR 44.144 - Opening.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... PAYMENT OF TAX, OR WITH DRAWBACK OF TAX Operations by Export Warehouse Proprietors Inventories § 44.144 Opening. An opening inventory shall be made by the export warehouse proprietor at the time of commencing... permit issued under § 44.93. A similar inventory shall be made by the export warehouse proprietor when he...

  11. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  12. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  13. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  14. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  15. 19 CFR 141.68 - Time of entry.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... (pursuant to § 24.25 of this chapter) have been successfully received by CBP via the Automated Broker... from warehouse for consumption. The time of entry of merchandise withdrawn from warehouse for... the order of the warehouse proprietor) is when: (1) CBP Form 7501 is executed in proper form and filed...

  16. 19 CFR 19.4 - CBP and proprietor responsibility and supervision over warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... inventory category of each article under FIFO procedures. Merchandise covered by a given unique identifier..., quantity counts of goods in warehouse inventories, spot checks of selected warehouse transactions or...) Maintain the inventory control and recordkeeping system in accordance with the provisions of § 19.12 of...

  17. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  18. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  19. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 10 2011-01-01 2011-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  20. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  1. 7 CFR 1427.16 - Movement and protection of warehouse-stored cotton.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Movement and protection of warehouse-stored cotton... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS COTTON Nonrecourse Cotton Loan and Loan Deficiency Payments § 1427.16 Movement and protection of warehouse-stored cotton. (a...

  2. 19 CFR 19.14 - Materials for use in manufacturing warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...; DEPARTMENT OF THE TREASURY CUSTOMS WAREHOUSES, CONTAINER STATIONS AND CONTROL OF MERCHANDISE THEREIN... statistical information as provided in § 141.61(e) of this chapter. If the merchandise has been imported or... report this information for each warehouse entry represented in the manufacturing process. [28 FR 14763...

  3. Population dynamics of stored maize insect pests in warehouses in two districts of Ghana

    USDA-ARS?s Scientific Manuscript database

    Understanding what insect species are present and their temporal and spatial patterns of distribution is important for developing a successful integrated pest management strategy for food storage in warehouses. Maize in many countries in Africa is stored in bags in warehouses, but little monitoring ...

  4. 27 CFR 44.142 - Records.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2013-04-01 2013-04-01 false Records. 44.142 Section 44... PAYMENT OF TAX, OR WITH DRAWBACK OF TAX Operations by Export Warehouse Proprietors § 44.142 Records. Every export warehouse proprietor must keep in such warehouse complete and concise records, containing the: (a...

  5. 75 FR 71724 - Real Estate Settlement Procedures Act (RESPA): Solicitation of Information on Changes in...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-24

    ... Procedures Act (RESPA): Solicitation of Information on Changes in Warehouse Lending and Other Loan Funding... guidance under RESPA to address possible changes in warehouse lending and other financing mechanisms used... in recent years, and especially on how warehouse lending currently operates within residential real...

  6. Photocopy of drawing (original drawing of Signal & Ordnance Warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Signal & Ordnance Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General)TRUSS DETAILS - MacDill Air Force Base, Signal & Ordnance Warehouse, 7620 Hanger Loop Drive, Tampa, Hillsborough County, FL

  7. 76 FR 40409 - Self-Regulatory Organizations; National Securities Clearing Corporation; Notice of Filing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-08

    ... Effectiveness of Proposed Rule Change Relating to Fees Associated With the Obligation Warehouse Service July 1... related to the new Obligation Warehouse service. II. Self-Regulatory Organization's Statement of Purpose... for NSCC's Obligation Warehouse service, a new functionality that was designed to enhance and replace...

  8. 17 CFR 31.8 - Cover of leverage contracts.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... this section. (2) Permissible cover for a long leverage contract is limited to: (i) Warehouse receipts... and accrued interest on any loan against such warehouse receipts does not exceed 70 percent of the current market value of the commodity represented by each receipt. (ii) Warehouse receipts for gold...

  9. The importance of data warehouses for physician executives.

    PubMed

    Ruffin, M

    1994-11-01

    Soon, most physicians will begin to learn about data warehouses and clinical and financial data about their patients stored in them. What is a data warehouse? Why are we seeing their emergence in health care only now? How does a hospital, or group practice, or health plan acquire or create a data warehouse? Who should be responsible for it, and what sort of training is needed by those in charge of using it for the edification of the sponsoring organization? I'll try to answer these questions in this article.

  10. Evaluation methodology for query-based scene understanding systems

    NASA Astrophysics Data System (ADS)

    Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.

    2015-05-01

    In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.

  11. Efficient data management tools for the heterogeneous big data warehouse

    NASA Astrophysics Data System (ADS)

    Alekseev, A. A.; Osipova, V. V.; Ivanov, M. A.; Klimentov, A.; Grigorieva, N. V.; Nalamwar, H. S.

    2016-09-01

    The traditional RDBMS has been consistent for the normalized data structures. RDBMS served well for decades, but the technology is not optimal for data processing and analysis in data intensive fields like social networks, oil-gas industry, experiments at the Large Hadron Collider, etc. Several challenges have been raised recently on the scalability of data warehouse like workload against the transactional schema, in particular for the analysis of archived data or the aggregation of data for summary and accounting purposes. The paper evaluates new database technologies like HBase, Cassandra, and MongoDB commonly referred as NoSQL databases for handling messy, varied and large amount of data. The evaluation depends upon the performance, throughput and scalability of the above technologies for several scientific and industrial use-cases. This paper outlines the technologies and architectures needed for processing Big Data, as well as the description of the back-end application that implements data migration from RDBMS to NoSQL data warehouse, NoSQL database organization and how it could be useful for further data analytics.

  12. 3. Photocopy of photograph (Original print, Phillip McCracken, courtesy of ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Photocopy of photograph (Original print, Phillip McCracken, courtesy of Bill Mitchell.) Photographer unknown, 1924. Cold Storage Warehouse on the left, north and west facades. On the right, north facade of the Hay and Grain Warehouse. - Curtis Wharf, Cold Storage Warehouse, O & Second Streets, Anacortes, Skagit County, WA

  13. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  14. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  15. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  16. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  17. 19 CFR 113.62 - Basic importation and entry bond conditions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... custody or withdrawn from a Customs bonded warehouse into the commerce of, or for consumption in, the... merchandise into a Customs bonded warehouse, the obligors agree; (i) To pay any duties, taxes, and charges found to be due on any of that merchandise which remains in the warehouse at the expiration of the...

  18. 78 FR 59289 - Clarification of Bales Made Available for Shipment by CCC-Approved Warehouses

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-26

    ... for CCC-approved warehouses storing cotton. The amendment would change the definition of Bales Made Available for Shipment (BMAS). CCC-approved cotton warehouses are currently required to report BMAS, among... information about bales available for shipment, benefiting both CCC and the cotton industry. DATES: We will...

  19. 7 CFR 735.110 - Conditions for delivery of agricultural products.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... product stored or handled in the warehouse on a demand made by: (1) The holder of the warehouse receipt... 7 Agriculture 7 2010-01-01 2010-01-01 false Conditions for delivery of agricultural products. 735... ACT Warehouse Licensing § 735.110 Conditions for delivery of agricultural products. (a) In the absence...

  20. 7 CFR 1421.106 - Warehouse-stored marketing assistance loan collateral.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... indicating that: (1) Storage charges through the maturity date have been prepaid; or (2) The producer has... commodity stored in an approved warehouse shall be the later of the following: (1) The date the commodity was received or deposited in the warehouse; (2) The date the storage charges start; or (3) The day...

  1. Photocopy of drawing (original drawing of Signal & Ordnance Warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Signal & Ordnance Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) SECTIONS AND DETAILS - MacDill Air Force Base, Signal & Ordnance Warehouse, 7620 Hanger Loop Drive, Tampa, Hillsborough County, FL

  2. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) END ELEVATION AND SECTIONS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  3. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) SECTIONS AND DETAILS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  4. Photocopy of drawing (original drawing of Signal & Ordnance Warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Signal & Ordnance Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) PLANS, ELEVATIONS, SECTIONS AND DETAILS - MacDill Air Force Base, Signal & Ordnance Warehouse, 7620 Hanger Loop Drive, Tampa, Hillsborough County, FL

  5. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) FRONT AND REAR ELEVATIONS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  6. 22 CFR 124.14 - Exports to warehouses or distribution points outside the United States.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Exports to warehouses or distribution points... to warehouses or distribution points outside the United States. (a) Agreements. Agreements (e.g... military nomenclature, the Federal stock number, nameplate data, and any control numbers under which the...

  7. 77 FR 40539 - Privacy Act of 1974; Implementation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-10

    ..., JUSTICE/FBI- 022, the FBI Data Warehouse System. In this notice of proposed rulemaking, the FBI proposes... FR 53342 (Aug. 31, 2010) and modified at 75 FR 66131 (Oct. 27, 2010) because the Data Warehouse... proposes to exempt the Data Warehouse System, Justice/FBI-022, from certain provisions of the Privacy Act...

  8. 75 FR 18824 - Agency Information Collection Activities: Notice of Intent To Renew Collection 3038-0019, Stocks...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-13

    ... Renew Collection 3038-0019, Stocks of Grain in Licensed Warehouses AGENCY: Commodity Futures Trading...., permitting electronic submission of responses. Stocks of Grain in Licensed Warehouses, OMB Control No. 3038... warehouses regular for delivery to keep records on stocks of commodities and make reports on call by the...

  9. Development of a data warehouse at an academic health system: knowing a place for the first time.

    PubMed

    Dewitt, Jocelyn G; Hampton, Philip M

    2005-11-01

    In 1998, the University of Michigan Health System embarked upon the design, development, and implementation of an enterprise-wide data warehouse, intending to use prioritized business questions to drive its design and implementation. Because of the decentralized nature of the academic health system and the development team's inability to identify and prioritize those institutional business questions, however, a bottom-up approach was used to develop the enterprise-wide data warehouse. Specific important data sets were identified for inclusion, and the technical team designed the system with an enterprise view and architecture rather than as a series of data marts. Using this incremental approach of adding data sets, institutional leaders were able to experience and then further define successful use of the integrated data made available to them. Even as requests for the use and expansion of the data warehouse outstrip the resources assigned for support, the data warehouse has become an integral component of the institution's information management strategy. The authors discuss the approach, process, current status, and successes and failures of the data warehouse.

  10. Automated realtime data import for the i2b2 clinical data warehouse: introducing the HL7 ETL cell.

    PubMed

    Majeed, Raphael W; Röhrig, Rainer

    2012-01-01

    Clinical data warehouses are used to consolidate all available clinical data from one or multiple organizations. They represent an important source for clinical research, quality management and controlling. Since its introduction, the data warehouse i2b2 gathered a large user base in the research community. Yet, little work has been done on the process of importing clinical data into data warehouses using existing standards. In this article, we present a novel approach of utilizing the clinical integration server as data source, commonly available in most hospitals. As information is transmitted through the integration server, the standardized HL7 message is immediately parsed and inserted into the data warehouse. Evaluation of import speeds suggest feasibility of the provided solution for real-time processing of HL7 messages. By using the presented approach of standardized data import, i2b2 can be used as a plug and play data warehouse, without the hurdle of customized import for every clinical information system or electronic medical record. The provided solution is available for download at http://sourceforge.net/projects/histream/.

  11. Visualizing collaborative electronic health record usage for hospitalized patients with heart failure.

    PubMed

    Soulakis, Nicholas D; Carson, Matthew B; Lee, Young Ji; Schneider, Daniel H; Skeehan, Connor T; Scholtens, Denise M

    2015-03-01

    To visualize and describe collaborative electronic health record (EHR) usage for hospitalized patients with heart failure. We identified records of patients with heart failure and all associated healthcare provider record usage through queries of the Northwestern Medicine Enterprise Data Warehouse. We constructed a network by equating access and updates of a patient's EHR to a provider-patient interaction. We then considered shared patient record access as the basis for a second network that we termed the provider collaboration network. We calculated network statistics, the modularity of provider interactions, and provider cliques. We identified 548 patient records accessed by 5113 healthcare providers in 2012. The provider collaboration network had 1504 nodes and 83 998 edges. We identified 7 major provider collaboration modules. Average clique size was 87.9 providers. We used a graph database to demonstrate an ad hoc query of our provider-patient network. Our analysis suggests a large number of healthcare providers across a wide variety of professions access records of patients with heart failure during their hospital stay. This shared record access tends to take place not only in a pairwise manner but also among large groups of providers. EHRs encode valuable interactions, implicitly or explicitly, between patients and providers. Network analysis provided strong evidence of multidisciplinary record access of patients with heart failure across teams of 100+ providers. Further investigation may lead to clearer understanding of how record access information can be used to strategically guide care coordination for patients hospitalized for heart failure. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  12. Data warehousing in disease management programs.

    PubMed

    Ramick, D C

    2001-01-01

    Disease management programs offer the benefits of lower disease occurrence, improved patient care, and lower healthcare costs. In such programs, the key mechanism used to identify individuals at risk for targeted diseases is the data warehouse. This article surveys recent warehousing techniques from HMOs to map out critical issues relating to the preparation, design, and implementation of a successful data warehouse. Discussions of scope, data cleansing, and storage management are included in depicting warehouse preparation and design; data implementation options are contrasted. Examples are provided of data warehouse execution in disease management programs that identify members with preexisting illnesses, as well as those exhibiting high-risk conditions. The proper deployment of successful data warehouses in disease management programs benefits both the organization and the member. Organizations benefit from decreased medical costs; members benefit through an improved quality of life through disease-specific care.

  13. Protecting privacy in a clinical data warehouse.

    PubMed

    Kong, Guilan; Xiao, Zhichun

    2015-06-01

    Peking University has several prestigious teaching hospitals in China. To make secondary use of massive medical data for research purposes, construction of a clinical data warehouse is imperative in Peking University. However, a big concern for clinical data warehouse construction is how to protect patient privacy. In this project, we propose to use a combination of symmetric block ciphers, asymmetric ciphers, and cryptographic hashing algorithms to protect patient privacy information. The novelty of our privacy protection approach lies in message-level data encryption, the key caching system, and the cryptographic key management system. The proposed privacy protection approach is scalable to clinical data warehouse construction with any size of medical data. With the composite privacy protection approach, the clinical data warehouse can be secure enough to keep the confidential data from leaking to the outside world. © The Author(s) 2014.

  14. Application of Information-Theoretic Data Mining Techniques in a National Ambulatory Practice Outcomes Research Network

    PubMed Central

    Wright, Adam; Ricciardi, Thomas N.; Zwick, Martin

    2005-01-01

    The Medical Quality Improvement Consortium data warehouse contains de-identified data on more than 3.6 million patients including their problem lists, test results, procedures and medication lists. This study uses reconstructability analysis, an information-theoretic data mining technique, on the MQIC data warehouse to empirically identify risk factors for various complications of diabetes including myocardial infarction and microalbuminuria. The risk factors identified match those risk factors identified in the literature, demonstrating the utility of the MQIC data warehouse for outcomes research, and RA as a technique for mining clinical data warehouses. PMID:16779156

  15. Warehouse stocking optimization based on dynamic ant colony genetic algorithm

    NASA Astrophysics Data System (ADS)

    Xiao, Xiaoxu

    2018-04-01

    In view of the various orders of FAW (First Automotive Works) International Logistics Co., Ltd., the SLP method is used to optimize the layout of the warehousing units in the enterprise, thus the warehouse logistics is optimized and the external processing speed of the order is improved. In addition, the relevant intelligent algorithms for optimizing the stocking route problem are analyzed. The ant colony algorithm and genetic algorithm which have good applicability are emphatically studied. The parameters of ant colony algorithm are optimized by genetic algorithm, which improves the performance of ant colony algorithm. A typical path optimization problem model is taken as an example to prove the effectiveness of parameter optimization.

  16. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2010-04-01 2010-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  17. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2011-04-01 2011-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  18. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2013-04-01 2013-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  19. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2012-04-01 2011-04-01 true Articles in a warehouse. 46... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  20. 27 CFR 46.236 - Articles in a warehouse.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 2 2014-04-01 2014-04-01 false Articles in a warehouse... Tubes Held for Sale on April 1, 2009 Filing Requirements § 46.236 Articles in a warehouse. (a) Articles... articles will be offered for sale. (b) Articles offered for sale at several locations must be reported on a...

  1. 8. AFRD WAREHOUSE, EAST SIDE DETAIL SHOWS CONNECTION OF LEANTO ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    8. AFRD WAREHOUSE, EAST SIDE DETAIL SHOWS CONNECTION OF LEAN-TO TO WALL. FACING WEST. NOTE THE PROFILE OF THE METAL AWNING ON SOUTH SIDE. ELECTRICAL CONDUIT AND OTHER SERVICES PENETRATE WALL. POLE SECURED WITH TRIANGULAR BRACES AT CORNER IS COMMUNICATION POLE. - Minidoka Relocation Center Warehouse, 111 South Fir Street, Shoshone, Lincoln County, ID

  2. Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse & Commissary in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) FIRST FLOOR PLAN AND DOOR DETAILS - MacDill Air Force Base, Quartermaster Warehouse & Commissary, 7621 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  3. Photocopy of drawing (original drawing of Q.M. Warehouse in possession ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photocopy of drawing (original drawing of Q.M. Warehouse in possession of MacDill Air Force Base, Civil Engineering, Tampa, Florida; 1940 architectural drawings by Construction Division, Office of the Quartermaster General) PLANS, ELEVATIONS, SECTIONS, AND ELECTRICAL DETAILS - MacDill Air Force Base, Quartermaster Warehouse, 7605 Hillsborough Loop Drive, Tampa, Hillsborough County, FL

  4. 75 FR 3919 - Privacy Act of 1974; as Amended; Notice To Amend an Existing System of Records

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-25

    ... Manager (ORSO). Data from the current TSIS (Unix-based) version are integrated into a data warehouse with... (where provided). The TSIS data warehouse database is only available to the system developer, and... available from the warehouse data do not contain Personally Identifiable Information (PII). Of the few...

  5. Developing a standardized healthcare cost data warehouse.

    PubMed

    Visscher, Sue L; Naessens, James M; Yawn, Barbara P; Reinalda, Megan S; Anderson, Stephanie S; Borah, Bijan J

    2017-06-12

    Research addressing value in healthcare requires a measure of cost. While there are many sources and types of cost data, each has strengths and weaknesses. Many researchers appear to create study-specific cost datasets, but the explanations of their costing methodologies are not always clear, causing their results to be difficult to interpret. Our solution, described in this paper, was to use widely accepted costing methodologies to create a service-level, standardized healthcare cost data warehouse from an institutional perspective that includes all professional and hospital-billed services for our patients. The warehouse is based on a National Institutes of Research-funded research infrastructure containing the linked health records and medical care administrative data of two healthcare providers and their affiliated hospitals. Since all patients are identified in the data warehouse, their costs can be linked to other systems and databases, such as electronic health records, tumor registries, and disease or treatment registries. We describe the two institutions' administrative source data; the reference files, which include Medicare fee schedules and cost reports; the process of creating standardized costs; and the warehouse structure. The costing algorithm can create inflation-adjusted standardized costs at the service line level for defined study cohorts on request. The resulting standardized costs contained in the data warehouse can be used to create detailed, bottom-up analyses of professional and facility costs of procedures, medical conditions, and patient care cycles without revealing business-sensitive information. After its creation, a standardized cost data warehouse is relatively easy to maintain and can be expanded to include data from other providers. Individual investigators who may not have sufficient knowledge about administrative data do not have to try to create their own standardized costs on a project-by-project basis because our data warehouse generates standardized costs for defined cohorts upon request.

  6. A Multidimensional Data Warehouse for Community Health Centers

    PubMed Central

    Kunjan, Kislaya; Toscos, Tammy; Turkcan, Ayten; Doebbeling, Brad N.

    2015-01-01

    Community health centers (CHCs) play a pivotal role in healthcare delivery to vulnerable populations, but have not yet benefited from a data warehouse that can support improvements in clinical and financial outcomes across the practice. We have developed a multidimensional clinic data warehouse (CDW) by working with 7 CHCs across the state of Indiana and integrating their operational, financial and electronic patient records to support ongoing delivery of care. We describe in detail the rationale for the project, the data architecture employed, the content of the data warehouse, along with a description of the challenges experienced and strategies used in the development of this repository that may help other researchers, managers and leaders in health informatics. The resulting multidimensional data warehouse is highly practical and is designed to provide a foundation for wide-ranging healthcare data analytics over time and across the community health research enterprise. PMID:26958297

  7. A Multidimensional Data Warehouse for Community Health Centers.

    PubMed

    Kunjan, Kislaya; Toscos, Tammy; Turkcan, Ayten; Doebbeling, Brad N

    2015-01-01

    Community health centers (CHCs) play a pivotal role in healthcare delivery to vulnerable populations, but have not yet benefited from a data warehouse that can support improvements in clinical and financial outcomes across the practice. We have developed a multidimensional clinic data warehouse (CDW) by working with 7 CHCs across the state of Indiana and integrating their operational, financial and electronic patient records to support ongoing delivery of care. We describe in detail the rationale for the project, the data architecture employed, the content of the data warehouse, along with a description of the challenges experienced and strategies used in the development of this repository that may help other researchers, managers and leaders in health informatics. The resulting multidimensional data warehouse is highly practical and is designed to provide a foundation for wide-ranging healthcare data analytics over time and across the community health research enterprise.

  8. Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data

    PubMed Central

    Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick

    2016-01-01

    This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859

  9. Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data.

    PubMed

    Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick

    2016-01-01

    This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.

  10. Sensitivity and Predictive Value of 15 PubMed Search Strategies to Answer Clinical Questions Rated Against Full Systematic Reviews

    PubMed Central

    Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-01-01

    Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care. PMID:22693047

  11. Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

    PubMed

    Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-06-12

    Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care.

  12. Advances in nowcasting influenza-like illness rates using search query logs

    NASA Astrophysics Data System (ADS)

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-08-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  13. Measuring Up: Implementing a Dental Quality Measure in the Electronic Health Record Context

    PubMed Central

    Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F

    2015-01-01

    Background Quality improvement requires quality measures that are validly implementable. In this work, we assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure (percentage of children who received fluoride varnish). Methods We defined how to implement the automated measure queries in a dental electronic health record (EHR). Within records identified through automated query, we manually reviewed a subsample to assess the performance of the query. Results The automated query found 71.0% of patients to have had fluoride varnish compared to 77.6% found using the manual chart review. The automated quality measure performance was 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. Conclusions Our findings support the feasibility of automated dental quality measure queries in the context of sufficient structured data. Information noted only in the free text rather than in structured data would require natural language processing approaches to effectively query. Practical Implications To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation in order to support near-term automated calculation of quality measures. PMID:26562736

  14. Advances in nowcasting influenza-like illness rates using search query logs.

    PubMed

    Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

    2015-08-03

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  15. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  16. 27 CFR 28.27 - Entry of wine into customs bonded warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Entry of wine into customs... Bonded Warehouses § 28.27 Entry of wine into customs bonded warehouses. Upon filing of the application or notice prescribed by § 28.122(a), wine may be withdrawn from a bonded wine cellar for transfer to any...

  17. Storage Information Management System (SIMS) Spaceflight Hardware Warehousing at Goddard Space Flight Center

    NASA Technical Reports Server (NTRS)

    Kubicko, Richard M.; Bingham, Lindy

    1995-01-01

    Goddard Space Flight Center (GSFC) on site and leased warehouses contain thousands of items of ground support equipment (GSE) and flight hardware including spacecraft, scaffolding, computer racks, stands, holding fixtures, test equipment, spares, etc. The control of these warehouses, and the management, accountability, and control of the items within them, is accomplished by the Logistics Management Division. To facilitate this management and tracking effort, the Logistics and Transportation Management Branch, is developing a system to provide warehouse personnel, property owners, and managers with storage and inventory information. This paper will describe that PC-based system and address how it will improve GSFC warehouse and storage management.

  18. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  19. Estimating Missing Features to Improve Multimedia Information Retrieval

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bagherjeiran, A; Love, N S; Kamath, C

    Retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The retrieval results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features.more » In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.« less

  20. Embarking on performance improvement.

    PubMed

    Brown, Bobbi; Falk, Leslie Hough

    2014-06-01

    Healthcare organizations should approach performance improvement as a program, not a project. The program should be led by a guidance team that identifies goals, prioritizes work, and removes barriers to enable clinical improvement teams and work groups to realize performance improvements. A healthcare enterprise data warehouse can provide the initial foundation for the program analytics. Evidence-based best practices can help achieve improved outcomes and reduced costs.

  1. Gun trauma and ophthalmic outcomes.

    PubMed

    Chopra, N; Gervasio, K A; Kalosza, B; Wu, A Y

    2018-04-01

    PurposeThis retrospective cohort study assesses the visual outcomes of patients who survive gunshot wounds to the head.MethodsThe Elmhurst City Hospital Trauma Registry and Mount Sinai Data Warehouse were queried for gun trauma resulting in ocular injury over a 16-year period. Thirty-one patients over 16 years of age were found who suffered a gunshot wound to the head and resultant ocular trauma: orbital fracture, ruptured globe, foreign body, or optic nerve injury. Gun types included all firearms and air guns. Nine patients were excluded due to incorrect coding or unavailable charts. Statistical analysis was performed using a simple bivariate analysis (χ 2 ).ResultsOf the 915 victims of gun trauma to the head, 27 (3.0%) sustained ocular injuries. Of the 22 patients whose records were accessible, 18 survived. Eight of the 18 surviving patients (44%) suffered long-term visual damage, defined as permanent loss of vision in at least one eye to the level of counting fingers or worse. Neither location of injury (P=0.243), nor type of gun used (P=0.296), nor cause of gun trauma (P=0.348) predicted visual loss outcome. The Glasgow Coma Scale eye response score on arrival to the hospital also did not predict visual loss outcome (P=0.793).ConclusionThere has been a dearth of research into gun trauma and even less research on the visual outcomes following gun trauma. Our study finds that survivors of gun trauma to the head suffer long-term visual damage 44% of the time after injury.

  2. Massive Query Resolution for Rapid Selective Dissemination of Information.

    ERIC Educational Resources Information Center

    Cohen, Jonathan D.

    1999-01-01

    Outlines an efficient approach to performing query resolution which, when matched with a keyword scanner, offers rapid selecting and routing for massive Boolean queries, and which is suitable for implementation on a desktop computer. Demonstrates the system's operation with large examples in a practical setting. (AEF)

  3. High Performance Visualization using Query-Driven Visualizationand Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, E. Wes; Campbell, Scott; Dart, Eli

    2006-06-15

    Query-driven visualization and analytics is a unique approach for high-performance visualization that offers new capabilities for knowledge discovery and hypothesis testing. The new capabilities akin to finding needles in haystacks are the result of combining technologies from the fields of scientific visualization and scientific data management. This approach is crucial for rapid data analysis and visualization in the petascale regime. This article describes how query-driven visualization is applied to a hero-sized network traffic analysis problem.

  4. 19 CFR 7.1 - Puerto Rico; spirits and wines withdrawn from warehouse for shipment to; duty on foreign-grown...

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... warehouse for shipment to; duty on foreign-grown coffee. 7.1 Section 7.1 Customs Duties U.S. CUSTOMS AND... warehouse for shipment to; duty on foreign-grown coffee. (a) When spirits and wines are withdrawn from a...-grown coffee shipped to Puerto Rico from the United States, but special Customs invoices shall not be...

  5. A&M. Radioactive parts security storage warehouses: TAN648 on left, and ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    A&M. Radioactive parts security storage warehouses: TAN-648 on left, and dolly storage building, TAN-647, on right. Camera facing south. This was the front entry for the warehouse and the rear of the dolly storage building. Date: August 6, 2003. INEEL negative no. HD-36-2-2 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID

  6. Data Warehouse Architecture for Army Installations

    DTIC Science & Technology

    1999-11-01

    Laboratory (CERL). Dr. Moonja Kim is Chief, CN-B and Dr. John Bandy is Chief, CN. The technical editor was Linda L. Wheatley, Information Technology...1994. Devlin, Barry, Data Warehouse, From Architecture to Implementation (Addison-Wesley, 1997). Inmon, W.H., Building the Data Warehouse ( John ...Magazine, August 1997. Kimball, Ralph, "Digging into Data Mining," DBMS Magazine, October 1997. Lewison , Lisa, "Data Mining: Intelligent Technology

  7. 16. Photographic copy of photograph (from original 4 x 5 ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    16. Photographic copy of photograph (from original 4 x 5 black and white print in the Army Port Contractors' 'Completion Report' at the Engineering Office, Oakland Army Base, California). Photograph taken prior to June 1942 by unknown photographer. SOUTHWEST BIRDS-EYE VIEW OF WAREHOUSES (BLDGS. 802-805). - Oakland Army Base, Warehouse Type, Tobruk Street, between Warehouse Road & Fifteenth Street, Oakland, Alameda County, CA

  8. Credit BG. View looks south southeast (162°) across foundation of ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Credit BG. View looks south southeast (162°) across foundation of Building 4332 Warehouse "B" (formerly T-81). Top of foundation for Building 4332 Warehouse "A" is visible at extreme left of view. In remote distance are buildings at Main Base, Edwards Air Force Base - Edwards Air Force Base, North Base, Warehouse B, Second Street at E Street, Boron, Kern County, CA

  9. Semantic web data warehousing for caGrid.

    PubMed

    McCusker, James P; Phillips, Joshua A; González Beltrán, Alejandra; Finkelstein, Anthony; Krauthammer, Michael

    2009-10-01

    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.

  10. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.

  11. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457

  12. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  13. Systems and methods for an extensible business application framework

    NASA Technical Reports Server (NTRS)

    Bell, David G. (Inventor); Crawford, Michael (Inventor)

    2012-01-01

    Method and systems for editing data from a query result include requesting a query result using a unique collection identifier for a collection of individual files and a unique identifier for a configuration file that specifies a data structure for the query result. A query result is generated that contains a plurality of fields as specified by the configuration file, by combining each of the individual files associated with a unique identifier for a collection of individual files. The query result data is displayed with a plurality of labels as specified in the configuration file. Edits can be performed by querying a collection of individual files using the configuration file, editing a portion of the query result, and transmitting only the edited information for storage back into a data repository.

  14. CSRQ: Communication-Efficient Secure Range Queries in Two-Tiered Sensor Networks

    PubMed Central

    Dai, Hua; Ye, Qingqun; Yang, Geng; Xu, Jia; He, Ruiliang

    2016-01-01

    In recent years, we have seen many applications of secure query in two-tiered wireless sensor networks. Storage nodes are responsible for storing data from nearby sensor nodes and answering queries from Sink. It is critical to protect data security from a compromised storage node. In this paper, the Communication-efficient Secure Range Query (CSRQ)—a privacy and integrity preserving range query protocol—is proposed to prevent attackers from gaining information of both data collected by sensor nodes and queries issued by Sink. To preserve privacy and integrity, in addition to employing the encoding mechanisms, a novel data structure called encrypted constraint chain is proposed, which embeds the information of integrity verification. Sink can use this encrypted constraint chain to verify the query result. The performance evaluation shows that CSRQ has lower communication cost than the current range query protocols. PMID:26907293

  15. Use of controlled vocabularies to improve biomedical information retrieval tasks.

    PubMed

    Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian

    2013-01-01

    The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.

  16. Effects of Natural Disaster Trends: A Case Study for Expanding the Pre-Positioning Network of CARE International

    PubMed Central

    Bozkurt, Melda; Duran, Serhan

    2012-01-01

    The increasing number of natural disasters in the last decade necessitates the increase in capacity and agility while delivering humanitarian relief. A common logistics strategy used by humanitarian organizations to respond this need is the establishment of pre-positioning warehouse networks. In the pre-positioning strategy, critical relief inventories are located near the regions at which they will be needed in advance of the onset of the disaster. Therefore, pre-positioning reduces the response time by totally or partially eliminating the procurement phase and increasing the availability of relief items just after the disaster strikes. Once the pre-positioning warehouse locations are decided and warehouses on those locations become operational, they will be in use for a long time. Therefore, the chosen locations should be robust enough to enable extensions, and to cope with changing trends in disaster types, locations and magnitudes. In this study, we analyze the effects of natural disaster trends on the expansion plan of pre-positioning warehouse network implemented by CARE International. We utilize a facility location model to identify the additional warehouse location(s) for relief items to be stored as an extension of the current warehouse network operated by CARE International, considering changing natural disaster trends observed over the past three decades. PMID:23066402

  17. Effects of natural disaster trends: a case study for expanding the pre-positioning network of CARE International.

    PubMed

    Bozkurt, Melda; Duran, Serhan

    2012-08-01

    The increasing number of natural disasters in the last decade necessitates the increase in capacity and agility while delivering humanitarian relief. A common logistics strategy used by humanitarian organizations to respond this need is the establishment of pre-positioning warehouse networks. In the pre-positioning strategy, critical relief inventories are located near the regions at which they will be needed in advance of the onset of the disaster. Therefore, pre-positioning reduces the response time by totally or partially eliminating the procurement phase and increasing the availability of relief items just after the disaster strikes. Once the pre-positioning warehouse locations are decided and warehouses on those locations become operational, they will be in use for a long time. Therefore, the chosen locations should be robust enough to enable extensions, and to cope with changing trends in disaster types, locations and magnitudes. In this study, we analyze the effects of natural disaster trends on the expansion plan of pre-positioning warehouse network implemented by CARE International. We utilize a facility location model to identify the additional warehouse location(s) for relief items to be stored as an extension of the current warehouse network operated by CARE International, considering changing natural disaster trends observed over the past three decades.

  18. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  19. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  20. Applying Wave (registered trademark) to Build an Air Force Community of Interest Shared Space

    DTIC Science & Technology

    2007-08-01

    Performance. It is essential that an inverse transform be defined for every transform, or else the query mediator must be smart enough to figure out how...to invert it. Without an inverse transform , if an incoming query constrains on the transformed attribute, the query mediator might generate a query...plan that is horribly inefficient. If you must code a custom transformation function, you must also code the inverse transform . Putting the

  1. Regional Differences in Thrombectomy Rates : Secondary use of Billing Codes in the MIRACUM (Medical Informatics for Research and Care in University Medicine) Consortium.

    PubMed

    Haverkamp, Christian; Ganslandt, Thomas; Horki, Petar; Boeker, Martin; Dörfler, Arnd; Schwab, Stefan; Berkefeld, Joachim; Pfeilschifter, Waltraud; Niesen, Wolf-Dirk; Egger, Karl; Kaps, Manfred; Brockmann, Marc A; Neumaier-Probst, Eva; Szabo, Kristina; Skalej, Martin; Bien, Siegfried; Best, Christoph; Prokosch, Hans-Ulrich; Urbach, Horst

    2018-01-08

    Mechanical thrombectomy, in addition to intravenous (i.v.) thrombolysis is recommended for treatment of acute stroke in patients with large vessel occlusions (LVO) in the anterior circulation up to 6 h after symptom onset. We compared thrombectomy rates of eight university hospitals of the MIRACUM consortium to analyze the implementation of this guideline in clinical routine. Anonymized billing data in a standardized format were loaded into a local i2b2 data warehouse by applying already existing extract, transform and load (ETL) routines. A locally executed uniform SQL (structured query language) query delivered aggregated site data for all inpatients with a discharge diagnosis of ischemic stroke (ICD-10 I63) containing counts for type of acute treatment, type of admission and age groups, which were centrally analyzed with R. From 2014 to 2016, the thrombectomy rate almost doubled from a mean of 4.7% to 9.6%, although significant differences between centers exist (range in 2016: 5.8-17%). The number of drip-and-ship procedures increased in 3 out of 8 centers. There was no evidence for a decrease in thrombectomy rates during weekends/holiday or among patients older than 80 years, but this age group is more likely to receive i.v. recombinant tissue plasminogen activator (rtPA). The observed increase of thrombectomy rates and drip-and-ship procedures without a significant difference between weekdays and weekends or patients of different ages is substantiating a rapid implementation of stroke guidelines within the analyzed neurovascular centers. The prototype of the MIRACUM Data Integration Center already contributes to health services research in Germany.

  2. Finding of No Significant Impact and Finding of No Practicable Alternative Construction of a Warehouse Complex MacDill Air Force Base, Florida

    DTIC Science & Technology

    2010-05-31

    that Federal agencies identify and assess environmental health and safety risks that might disproportionately affect children. The Proposed Action...extra water is sent to one irrigation field near Golf Course Avenue and Affected Environment Environmental Assessment for Warehouse Complex...the area . Environmental Consequences Environmental Assessment for Warehouse Complex MacDill AFB, Florida FEBRUARY 2010 4-15 FINAL Figure

  3. Architectural Survey at Joint Base Langley-Eustis of Fort Eustis Buildings and Structures Built 1946-1975: Volume 2 (Inventory Forms)

    DTIC Science & Technology

    2015-12-01

    This page intentionally left blank.) ERDC/CERL TR-15-37, Vol. II Fort Eustis, Building 1605 593 FORT EUSTIS...Warehouse Supply & Equipment Base - General Purpose Warehouse - Building 1605 STATUS Usable ARCHITECT/BUILDER Unknown DATE OF CONSTRUCTION 1955 DATE...Building 1605 is located in the 1600 Area with three other similar warehouse/storage buildings (1607, 1608, and 1610). The buildings are located

  4. Factors That Affect a School District's Ability to Successfully Implement the Use of Data Warehouse Applications in the Data Driven Decision Making Process

    ERIC Educational Resources Information Center

    DeLoach, Robin

    2012-01-01

    The purpose of this study was to explore the factors that influence the ability of teachers and administrators to use data obtained from a data warehouse to inform instruction. The mixed methods study was guided by the following questions: 1) What data warehouse application features affect the ability of an educator to effectively use the…

  5. Assessing the Ability of the Afghan Ministry of Interior Affairs to Support the Afghan Local Police

    DTIC Science & Technology

    2016-01-01

    positions were filled.6 Unable to retain trained personnel, manpower at the national warehouses was decreasing in 2013, yet requests for equipment...MOI’s national warehouses located on the outskirts of the capital. In addition, we conducted a quantitative analysis of logistics and personnel data...made transporting supplies more challeng- ing.35 Coalition and Afghan interviewees said that the logistics chain from MOI’s national warehouses in

  6. A Search Algorithm for Determination of Economic Order Quantity in a Two-Level Supply Chain System with Transportation Cost

    NASA Astrophysics Data System (ADS)

    Pirayesh Neghab, Mohammadali; Haji, Rasoul

    This study considers a two-level supply chain system consisting of one warehouse and a number of identical retailers. In this system, we incorporate transportation costs into inventory replenishment decisions. The transportation cost contains a fixed cost and a variable cost. We assume that the demand rate at each retailer is known and the demand is confined to a single item. First, we derive the total cost which is the sum of the holding and ordering cost at the warehouse and retailers as well as the transportation cost from the warehouse to retailers. Then, we propose a search algorithm to find the economic order quantities for the warehouse and retailers which minimize the total cost.

  7. Lagrange multiplier for perishable inventory model considering warehouse capacity planning

    NASA Astrophysics Data System (ADS)

    Amran, Tiena Gustina; Fatima, Zenny

    2017-06-01

    This paper presented Lagrange Muktiplier approach for solving perishable raw material inventory planning considering warehouse capacity. A food company faced an issue of managing perishable raw materials and marinades which have limited shelf life. Another constraint to be considered was the capacity of the warehouse. Therefore, an inventory model considering shelf life and raw material warehouse capacity are needed in order to minimize the company's inventory cost. The inventory model implemented in this study was the adapted economic order quantity (EOQ) model which is optimized using Lagrange multiplier. The model and solution approach were applied to solve a case industry in a food manufacturer. The result showed that the total inventory cost decreased 2.42% after applying the proposed approach.

  8. Using AberOWL for fast and scalable reasoning over BioPortal ontologies.

    PubMed

    Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert

    2016-08-08

    Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.

  9. Translation in Data Mining to Advance Personalized Medicine for Health Equity.

    PubMed

    Estape, Estela S; Mays, Mary Helen; Sternke, Elizabeth A

    2016-01-01

    Personalized medicine is the development of 'tailored' therapies that reflect traditional medical approaches, with the incorporation of the patient's unique genetic profile and the environmental basis of the disease. These individualized strategies encompass disease prevention, diagnosis, as well as treatment strategies. Today's healthcare workforce is faced with the availability of massive amounts of patient- and disease-related data. When mined effectively, these data will help produce more efficient and effective diagnoses and treatment, leading to better prognoses for patients at both the individual and population level. Designing preventive and therapeutic interventions for those patients who will benefit most while minimizing side effects and controlling healthcare costs, requires bringing diverse data sources together in an analytic paradigm. A resource to clinicians in the development and application of personalized medicine is largely facilitated, perhaps even driven, by the analysis of "big data". For example, the availability of clinical data warehouses is a significant resource for clinicians in practicing personalized medicine. These "big data" repositories can be queried by clinicians, using specific questions, with data used to gain an understanding of challenges in patient care and treatment. Health informaticians are critical partners to data analytics including the use of technological infrastructures and predictive data mining strategies to access data from multiple sources, assisting clinicians' interpretation of data and development of personalized, targeted therapy recommendations. In this paper, we look at the concept of personalized medicine, offering perspectives in four important, influencing topics: 1) the availability of 'big data' and the role of biomedical informatics in personalized medicine, 2) the need for interdisciplinary teams in the development and evaluation of personalized therapeutic approaches, and 3) the impact of electronic medical record systems and clinical data warehouses on the field of personalized medicine. In closing, we present our fourth perspective, an overview to some of the ethical concerns related to personalized medicine and health equity.

  10. Reactome graph database: Efficient access to complex pathway data

    PubMed Central

    Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902

  11. Reactome graph database: Efficient access to complex pathway data.

    PubMed

    Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

  12. miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

    PubMed Central

    Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.

    2005-01-01

    A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938

  13. Pulsed coherent population trapping with repeated queries for producing single-peaked high contrast Ramsey interference

    NASA Astrophysics Data System (ADS)

    Warren, Z.; Shahriar, M. S.; Tripathi, R.; Pati, G. S.

    2018-02-01

    A repeated query technique has been demonstrated as a new interrogation method in pulsed coherent population trapping for producing single-peaked Ramsey interference with high contrast. This technique enhances the contrast of the central Ramsey fringe by nearly 1.5 times and significantly suppresses the side fringes by using more query pulses ( >10) in the pulse cycle. Theoretical models have been developed to simulate Ramsey interference and analyze the characteristics of the Ramsey spectrum produced by the repeated query technique. Experiments have also been carried out employing a repeated query technique in a prototype rubidium clock to study its frequency stability performance.

  14. Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query.

    PubMed

    Nadkarni, P M

    1997-08-01

    Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.

  15. 48 CFR 36.213-3 - Invitations for bids.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... bidders to inspect the site, obtain subcontract bids, examine data concerning the work, and prepare... examine the data concerning performance of the work (see 36.210). (6) Information concerning any facilities, such as utilities, office space, and warehouse space, to be furnished during construction. (7...

  16. 48 CFR 36.213-3 - Invitations for bids.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... bidders to inspect the site, obtain subcontract bids, examine data concerning the work, and prepare... examine the data concerning performance of the work (see 36.210). (6) Information concerning any facilities, such as utilities, office space, and warehouse space, to be furnished during construction. (7...

  17. A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2016-11-01

    The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.

  18. RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks

    PubMed Central

    Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng

    2017-01-01

    Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197

  19. Students academic performance based on behavior

    NASA Astrophysics Data System (ADS)

    Maulida, Juwita Dien; Kariyam

    2017-12-01

    Utilization of data in an information system that can be used for decision making that utilizes existing data warehouse to help dig useful information to make decisions correctly and accurately. Experience API (xAPI) is one of the enabling technologies for collecting data, so xAPI can be used as a data warehouse that can be used for various needs. One software application whose data is collected in xAPI is LMS. LMS is a software used in an electronic learning process that can handle all aspects of learning, by using LMS can also be known how the learning process and the aspects that can affect learning achievement. One of the aspects that can affect the learning achievement is the background of each student, which is not necessarily the student with a good background is an outstanding student or vice versa. Therefore, an action is needed to anticipate this problem. Prediction of student academic performance using Naive Bayes algorithm obtained accuracy of 67.7983% and error 32.2917%.

  20. A Conceptual Modeling Approach for OLAP Personalization

    NASA Astrophysics Data System (ADS)

    Garrigós, Irene; Pardillo, Jesús; Mazón, Jose-Norberto; Trujillo, Juan

    Data warehouses rely on multidimensional models in order to provide decision makers with appropriate structures to intuitively analyze data with OLAP technologies. However, data warehouses may be potentially large and multidimensional structures become increasingly complex to be understood at a glance. Even if a departmental data warehouse (also known as data mart) is used, these structures would be also too complex. As a consequence, acquiring the required information is more costly than expected and decision makers using OLAP tools may get frustrated. In this context, current approaches for data warehouse design are focused on deriving a unique OLAP schema for all analysts from their previously stated information requirements, which is not enough to lighten the complexity of the decision making process. To overcome this drawback, we argue for personalizing multidimensional models for OLAP technologies according to the continuously changing user characteristics, context, requirements and behaviour. In this paper, we present a novel approach to personalizing OLAP systems at the conceptual level based on the underlying multidimensional model of the data warehouse, a user model and a set of personalization rules. The great advantage of our approach is that a personalized OLAP schema is provided for each decision maker contributing to better satisfy their specific analysis needs. Finally, we show the applicability of our approach through a sample scenario based on our CASE tool for data warehouse development.

  1. A soft computing-based approach to optimise queuing-inventory control problem

    NASA Astrophysics Data System (ADS)

    Alaghebandha, Mohammad; Hajipour, Vahid

    2015-04-01

    In this paper, a multi-product continuous review inventory control problem within batch arrival queuing approach (MQr/M/1) is developed to find the optimal quantities of maximum inventory. The objective function is to minimise summation of ordering, holding and shortage costs under warehouse space, service level and expected lost-sales shortage cost constraints from retailer and warehouse viewpoints. Since the proposed model is Non-deterministic Polynomial-time hard, an efficient imperialist competitive algorithm (ICA) is proposed to solve the model. To justify proposed ICA, both ganetic algorithm and simulated annealing algorithm are utilised. In order to determine the best value of algorithm parameters that result in a better solution, a fine-tuning procedure is executed. Finally, the performance of the proposed ICA is analysed using some numerical illustrations.

  2. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  3. Study on resources and environmental data integration towards data warehouse construction covering trans-boundary area of China, Russia and Mongolia

    NASA Astrophysics Data System (ADS)

    Wang, J.; Song, J.; Gao, M.; Zhu, L.

    2014-02-01

    The trans-boundary area between Northern China, Mongolia and eastern Siberia of Russia is a continuous geographical area located in north eastern Asia. Many common issues in this region need to be addressed based on a uniform resources and environmental data warehouse. Based on the practice of joint scientific expedition, the paper presented a data integration solution including 3 steps, i.e., data collection standards and specifications making, data reorganization and process, data warehouse design and development. A series of data collection standards and specifications were drawn up firstly covering more than 10 domains. According to the uniform standard, 20 resources and environmental survey databases in regional scale, and 11 in-situ observation databases were reorganized and integrated. North East Asia Resources and Environmental Data Warehouse was designed, which included 4 layers, i.e., resources layer, core business logic layer, internet interoperation layer, and web portal layer. The data warehouse prototype was developed and deployed initially. All the integrated data in this area can be accessed online.

  4. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

  5. Clinical Data Warehouse: An Effective Tool to Create Intelligence in Disease Management.

    PubMed

    Karami, Mahtab; Rahimi, Azin; Shahmirzadi, Ali Hosseini

    Clinical business intelligence tools such as clinical data warehouse enable health care organizations to objectively assess the disease management programs that affect the quality of patients' life and well-being in public. The purpose of these programs is to reduce disease occurrence, improve patient care, and decrease health care costs. Therefore, applying clinical data warehouse can be effective in generating useful information about aspects of patient care to facilitate budgeting, planning, research, process improvement, external reporting, benchmarking, and trend analysis, as well as to enable the decisions needed to prevent the progression or appearance of the illness aligning with maintaining the health of the population. The aim of this review article is to describe the benefits of clinical data warehouse applications in creating intelligence for disease management programs.

  6. Development of a public health reporting data warehouse: lessons learned.

    PubMed

    Rizi, Seyed Ali Mussavi; Roudsari, Abdul

    2013-01-01

    Data warehouse projects are perceived to be risky and prone to failure due to many organizational and technical challenges. However, often iterative and lengthy processes of implementation of data warehouses at an enterprise level provide an opportunity for formative evaluation of these solutions. This paper describes lessons learned from successful development and implementation of the first phase of an enterprise data warehouse to support public health surveillance at British Columbia Centre for Disease Control. Iterative and prototyping approach to development, overcoming technical challenges of extraction and integration of data from large scale clinical and ancillary systems, a novel approach to record linkage, flexible and reusable modeling of clinical data, and securing senior management support at the right time were the main factors that contributed to the success of the data warehousing project.

  7. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial

    PubMed Central

    Roelofs, Erik; Persoon, Lucas; Nijsten, Sebastiaan; Wiessler, Wolfgang; Dekker, André; Lambin, Philippe

    2016-01-01

    Introduction Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. Material and methods In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. Results The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p < 0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p < 0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Conclusions Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases. PMID:23394741

  8. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial.

    PubMed

    Roelofs, Erik; Persoon, Lucas; Nijsten, Sebastiaan; Wiessler, Wolfgang; Dekker, André; Lambin, Philippe

    2013-07-01

    Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p<0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p<0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  9. Refrigerated Warehouse Demand Response Strategy Guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scott, Doug; Castillo, Rafael; Larson, Kyle

    This guide summarizes demand response measures that can be implemented in refrigerated warehouses. In an appendix, it also addresses related energy efficiency opportunities. Reducing overall grid demand during peak periods and energy consumption has benefits for facility operators, grid operators, utility companies, and society. State wide demand response potential for the refrigerated warehouse sector in California is estimated to be over 22.1 Megawatts. Two categories of demand response strategies are described in this guide: load shifting and load shedding. Load shifting can be accomplished via pre-cooling, capacity limiting, and battery charger load management. Load shedding can be achieved by lightingmore » reduction, demand defrost and defrost termination, infiltration reduction, and shutting down miscellaneous equipment. Estimation of the costs and benefits of demand response participation yields simple payback periods of 2-4 years. To improve demand response performance, it’s suggested to install air curtains and another form of infiltration barrier, such as a rollup door, for the passageways. Further modifications to increase efficiency of the refrigeration unit are also analyzed. A larger condenser can maintain the minimum saturated condensing temperature (SCT) for more hours of the day. Lowering the SCT reduces the compressor lift, which results in an overall increase in refrigeration system capacity and energy efficiency. Another way of saving energy in refrigerated warehouses is eliminating the use of under-floor resistance heaters. A more energy efficient alternative to resistance heaters is to utilize the heat that is being rejected from the condenser through a heat exchanger. These energy efficiency measures improve efficiency either by reducing the required electric energy input for the refrigeration system, by helping to curtail the refrigeration load on the system, or by reducing both the load and required energy input.« less

  10. Assisting Consumer Health Information Retrieval with Query Recommendations

    PubMed Central

    Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

    2006-01-01

    Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944

  11. MorphoSaurus--design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain.

    PubMed

    Markó, K; Schulz, S; Hahn, U

    2005-01-01

    We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain. Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents. We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline. Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contra-distinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.

  12. WAREHOUSE END FRAMING. United Engineering Company Ltd., Alameda Shipyard. Sections ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    WAREHOUSE END FRAMING. United Engineering Company Ltd., Alameda Shipyard. Sections at north and detail sections. No architect noted. Drawn by Penney. Plan no. 2-N-9 (U.E. Co. plan no. 10,523). Scales 1/4 inch and 1 inch to the foot. March 10, 1942, no revisions. U.S. Navy, Bureau of Yards & Docks, Contract no. bs 76, item no. 22A. Approved for construction October 9, 1943. blueprint - United Engineering Company Shipyard, Warehouse, 2900 Main Street, Alameda, Alameda County, CA

  13. Implementation of a metadata architecture and knowledge collection to support semantic interoperability in an enterprise data warehouse.

    PubMed

    Dhaval, Rakesh; Borlawsky, Tara; Ostrander, Michael; Santangelo, Jennifer; Kamal, Jyoti; Payne, Philip R O

    2008-11-06

    In order to enhance interoperability between enterprise systems, and improve data validity and reliability throughout The Ohio State University Medical Center (OSUMC), we have initiated the development of an ontology-anchored metadata architecture and knowledge collection for our enterprise data warehouse. The metadata and corresponding semantic relationships stored in the OSUMC knowledge collection are intended to promote consistency and interoperability across the heterogeneous clinical, research, business and education information managed within the data warehouse.

  14. Fungal growth and the presence of sterigmatocystin in hard cheese.

    PubMed

    Northolt, M D; van Egmond, H P; Soentoro, P; Deijll, E

    1980-01-01

    Molds isolated from visibly molded cheeses in shops, households, and warehouses have been identified. Mold flora of cheeses in shops and households consisted mainly of Penicillium verrucosum var. cyclopium. On cheeses ripening in warehouses, Penicillium verrucosum var. cyclopium, Aspergillus versicolor, Aspergillus repens, and Enicillium verrucosum var. verrucosum were the dominant mold species. Cheese ripening in warehouses and molded with A. versicolor were examined for sterigmatocystin. Nine of 39 cheese samples contained sterigmatocystin in the surface layer in concentrations ranging from 5 to 600 micrograms/kg.

  15. 46 CFR 525.1 - Purpose and scope.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ..., warehouse or other terminal facilities in connection with a common carrier, or in connection with a common...; common carriers who perform port terminal services; and warehousemen who operate port terminal facilities... storage spaces, cold storage plants, cranes, grain elevators and/or bulk cargo loading and/or unloading...

  16. Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus

    ERIC Educational Resources Information Center

    Lyall-Wilson, Jennifer Rae

    2013-01-01

    The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…

  17. Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram

    2013-01-01

    The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over amore » reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.« less

  18. 27 CFR 45.11 - Meaning of terms.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... the singular, and vice versa, and words indicating the masculine gender shall include the feminine... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  19. 27 CFR 45.11 - Meaning of terms.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... the singular, and vice versa, and words indicating the masculine gender shall include the feminine... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  20. 27 CFR 45.11 - Meaning of terms.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... the singular, and vice versa, and words indicating the masculine gender shall include the feminine... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  1. 27 CFR 45.11 - Meaning of terms.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... the singular, and vice versa, and words indicating the masculine gender shall include the feminine... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  2. 27 CFR 44.145 - Special.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... PAYMENT OF TAX, OR WITH DRAWBACK OF TAX Operations by Export Warehouse Proprietors Inventories § 44.145 Special. A special inventory shall be made by the export warehouse proprietor whenever required by any...

  3. High-performance web services for querying gene and variant annotation.

    PubMed

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  4. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  5. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  6. 27 CFR 46.163 - Meaning of terms.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... singular, words in the singular form shall include the plural, and words importing the masculine gender... warehouse with respect to the operation of such warehouse. Package. The container in which tobacco products...

  7. 27 CFR 44.11 - Meaning of terms.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... singular, and vice versa, and words indicating the masculine gender shall include the feminine. The terms... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  8. 27 CFR 46.163 - Meaning of terms.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... singular, words in the singular form shall include the plural, and words importing the masculine gender... warehouse with respect to the operation of such warehouse. Package. The container in which tobacco products...

  9. 27 CFR 44.11 - Meaning of terms.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... singular, and vice versa, and words indicating the masculine gender shall include the feminine. The terms... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  10. 27 CFR 46.163 - Meaning of terms.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... singular, words in the singular form shall include the plural, and words importing the masculine gender... warehouse with respect to the operation of such warehouse. Package. The container in which tobacco products...

  11. 27 CFR 44.11 - Meaning of terms.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... singular, and vice versa, and words indicating the masculine gender shall include the feminine. The terms... warehouse with respect to the operation of such warehouse. Package. The immediate container in which tobacco...

  12. 27 CFR 46.163 - Meaning of terms.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... singular, words in the singular form shall include the plural, and words importing the masculine gender... warehouse with respect to the operation of such warehouse. Package. The container in which tobacco products...

  13. 27 CFR 46.163 - Meaning of terms.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... singular, words in the singular form shall include the plural, and words importing the masculine gender... warehouse with respect to the operation of such warehouse. Package. The container in which tobacco products...

  14. U.S. Naval Base, Pearl Harbor, Retail Warehouse, Fleet Landing Halawa, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    U.S. Naval Base, Pearl Harbor, Retail Warehouse, Fleet Landing Halawa, near Kamehameha Highway between Richardson Recreation Center & USS Arizona Memorial Visitor Center, Pearl City, Honolulu County, HI

  15. Remote sensing and GIS integration: Towards intelligent imagery within a spatial data infrastructure

    NASA Astrophysics Data System (ADS)

    Abdelrahim, Mohamed Mahmoud Hosny

    2001-11-01

    In this research, an "Intelligent Imagery System Prototype" (IISP) was developed. IISP is an integration tool that facilitates the environment for active, direct, and on-the-fly usage of high resolution imagery, internally linked to hidden GIS vector layers, to query the real world phenomena and, consequently, to perform exploratory types of spatial analysis based on a clear/undisturbed image scene. The IISP was designed and implemented using the software components approach to verify the hypothesis that a fully rectified, partially rectified, or even unrectified digital image can be internally linked to a variety of different hidden vector databases/layers covering the end user area of interest, and consequently may be reliably used directly as a base for "on-the-fly" querying of real-world phenomena and for performing exploratory types of spatial analysis. Within IISP, differentially rectified, partially rectified (namely, IKONOS GEOCARTERRA(TM)), and unrectified imagery (namely, scanned aerial photographs and captured video frames) were investigated. The system was designed to handle four types of spatial functions, namely, pointing query, polygon/line-based image query, database query, and buffering. The system was developed using ESRI MapObjects 2.0a as the core spatial component within Visual Basic 6.0. When used to perform the pre-defined spatial queries using different combinations of image and vector data, the IISP provided the same results as those obtained by querying pre-processed vector layers even when the image used was not orthorectified and the vector layers had different parameters. In addition, the real-time pixel location orthorectification technique developed and presented within the IKONOS GEOCARTERRA(TM) case provided a horizontal accuracy (RMSE) of +/- 2.75 metres. This accuracy is very close to the accuracy level obtained when purchasing the orthorectified IKONOS PRECISION products (RMSE of +/- 1.9 metre). The latter cost approximately four times as much as the IKONOS GEOCARTERRA(TM) products. The developed IISP is a step closer towards the direct and active involvement of high-resolution remote sensing imagery in querying the real world and performing exploratory types of spatial analysis. (Abstract shortened by UMI.)

  16. 47 CFR 52.26 - NANC Recommendations on Local Number Portability Administration.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... perform a database query to determine if the telephone number has been ported to another local exchange carrier, the local exchange carrier may block the unqueried call only if performing the database query is... manage and oversee the local number portability administrators, subject to review by the NANC, but only...

  17. 47 CFR 52.26 - NANC Recommendations on Local Number Portability Administration.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... perform a database query to determine if the telephone number has been ported to another local exchange carrier, the local exchange carrier may block the unqueried call only if performing the database query is... manage and oversee the local number portability administrators, subject to review by the NANC, but only...

  18. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of themore » highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.« less

  19. A simulated annealing approach for redesigning a warehouse network problem

    NASA Astrophysics Data System (ADS)

    Khairuddin, Rozieana; Marlizawati Zainuddin, Zaitul; Jiun, Gan Jia

    2017-09-01

    Now a day, several companies consider downsizing their distribution networks in ways that involve consolidation or phase-out of some of their current warehousing facilities due to the increasing competition, mounting cost pressure and taking advantage on the economies of scale. Consequently, the changes on economic situation after a certain period of time require an adjustment on the network model in order to get the optimal cost under the current economic conditions. This paper aimed to develop a mixed-integer linear programming model for a two-echelon warehouse network redesign problem with capacitated plant and uncapacitated warehouses. The main contribution of this study is considering capacity constraint for existing warehouses. A Simulated Annealing algorithm is proposed to tackle with the proposed model. The numerical solution showed the model and method of solution proposed was practical.

  20. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  1. SHARE: system design and case studies for statistical health information release

    PubMed Central

    Gardner, James; Xiong, Li; Xiao, Yonghui; Gao, Jingjing; Post, Andrew R; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2013-01-01

    Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses. PMID:23059729

  2. 7 CFR 28.906 - Sampling arrangements.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Sampling arrangements. 28.906 Section 28.906... Producers Sampling § 28.906 Sampling arrangements. (a) Cotton must be sampled by a gin or warehouse that... an authorized representative may direct that sampling be performed by employees of the Department of...

  3. 7 CFR 28.906 - Sampling arrangements.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Sampling arrangements. 28.906 Section 28.906... Producers Sampling § 28.906 Sampling arrangements. (a) Cotton must be sampled by a gin or warehouse that... an authorized representative may direct that sampling be performed by employees of the Department of...

  4. Solar heating system installed at Telex Communications, Inc., Blue Earth, Minnesota

    NASA Technical Reports Server (NTRS)

    1977-01-01

    The solar heating system for space heating a 97,000 square foot building which houses administrative offices, assembly areas, and warehouse space is summarized. Information on system description, test data, major problems and resolutions, performance, operation and maintenance manual, manufacturer's literature, and as-built drawings is presented.

  5. Opportunities for Energy Efficiency and Automated Demand Response in Industrial Refrigerated Warehouses in California

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lekov, Alex; Thompson, Lisa; McKane, Aimee

    2009-05-11

    This report summarizes the Lawrence Berkeley National Laboratory's research to date in characterizing energy efficiency and open automated demand response opportunities for industrial refrigerated warehouses in California. The report describes refrigerated warehouses characteristics, energy use and demand, and control systems. It also discusses energy efficiency and open automated demand response opportunities and provides analysis results from three demand response studies. In addition, several energy efficiency, load management, and demand response case studies are provided for refrigerated warehouses. This study shows that refrigerated warehouses can be excellent candidates for open automated demand response and that facilities which have implemented energy efficiencymore » measures and have centralized control systems are well-suited to shift or shed electrical loads in response to financial incentives, utility bill savings, and/or opportunities to enhance reliability of service. Control technologies installed for energy efficiency and load management purposes can often be adapted for open automated demand response (OpenADR) at little additional cost. These improved controls may prepare facilities to be more receptive to OpenADR due to both increased confidence in the opportunities for controlling energy cost/use and access to the real-time data.« less

  6. Protocol for a national blood transfusion data warehouse from donor to recipient

    PubMed Central

    van Hoeven, Loan R; Hooftman, Babette H; Janssen, Mart P; de Bruijne, Martine C; de Vooght, Karen M K; Kemper, Peter; Koopman, Maria M W

    2016-01-01

    Introduction Blood transfusion has health-related, economical and safety implications. In order to optimise the transfusion chain, comprehensive research data are needed. The Dutch Transfusion Data warehouse (DTD) project aims to establish a data warehouse where data from donors and transfusion recipients are linked. This paper describes the design of the data warehouse, challenges and illustrative applications. Study design and methods Quantitative data on blood donors (eg, age, blood group, antibodies) and products (type of product, processing, storage time) are obtained from the national blood bank. These are linked to data on the transfusion recipients (eg, transfusions administered, patient diagnosis, surgical procedures, laboratory parameters), which are extracted from hospital electronic health records. Applications Expected scientific contributions are illustrated for 4 applications: determine risk factors, predict blood use, benchmark blood use and optimise process efficiency. For each application, examples of research questions are given and analyses planned. Conclusions The DTD project aims to build a national, continuously updated transfusion data warehouse. These data have a wide range of applications, on the donor/production side, recipient studies on blood usage and benchmarking and donor–recipient studies, which ultimately can contribute to the efficiency and safety of blood transfusion. PMID:27491665

  7. Measuring up: Implementing a dental quality measure in the electronic health record context.

    PubMed

    Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F

    2016-01-01

    Quality improvement requires using quality measures that can be implemented in a valid manner. Using guidelines set forth by the Meaningful Use portion of the Health Information Technology for Economic and Clinical Health Act, the authors assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure to determine the percentage of children who received fluoride varnish. The authors defined how to implement the automated measure queries in a dental electronic health record. Within records identified through automated query, the authors manually reviewed a subsample to assess the performance of the query. The automated query results revealed that 71.0% of patients had fluoride varnish compared with the manual chart review results that indicated 77.6% of patients had fluoride varnish. The automated quality measure performance results indicated 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. The authors' findings support the feasibility of using automated dental quality measure queries in the context of sufficient structured data. Information noted only in free text rather than in structured data would require using natural language processing approaches to effectively query electronic health records. To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation to support near-term automated calculation of quality measures. Copyright © 2016 American Dental Association. Published by Elsevier Inc. All rights reserved.

  8. 5. ROOF DETAIL, LOOKING EAST TOWARD SECOND FLOOR WAREHOUSE FROM ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    5. ROOF DETAIL, LOOKING EAST TOWARD SECOND FLOOR WAREHOUSE FROM ROOF OF ASSEMBLY AREA. - Ford Motor Company Long Beach Assembly Plant, Assembly Building, 700 Henry Ford Avenue, Long Beach, Los Angeles County, CA

  9. 44. RAILROAD TRACKS, WITH BISHOP'S BLOCK, MCFADDEN COFFEE AND SPICE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    44. RAILROAD TRACKS, WITH BISHOP'S BLOCK, MCFADDEN COFFEE AND SPICE COMPANY FACTORY AND WAREHOUSE AND DUBUQUE SEED COMPANY WAREHOUSE IN BACKGROUND. VIEW TO SOUTHWEST. - Dubuque Commercial & Industrial Buildings, Dubuque, Dubuque County, IA

  10. 43. RAILROAD TRACKS, WITH BISHOP'S BLOCK, MCFADDEN COFFEE AND SPICE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    43. RAILROAD TRACKS, WITH BISHOP'S BLOCK, MCFADDEN COFFEE AND SPICE COMPANY FACTORY AND WAREHOUSE AND DUBUQUE SEED COMPANY WAREHOUSE IN BACKGROUND. VIEW TO SOUTHWEST. - Dubuque Commercial & Industrial Buildings, Dubuque, Dubuque County, IA

  11. 42. RAILROAD TRACKS, WITH BISHOP'S BLOCK, MCFADDEN COFFEE AND SPICE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    42. RAILROAD TRACKS, WITH BISHOP'S BLOCK, MCFADDEN COFFEE AND SPICE COMPANY FACTORY AND WAREHOUSE AND DUBUQUE SEED COMPANY WAREHOUSE IN BACKGROUND. VIEW TO SOUTHWEST. - Dubuque Commercial & Industrial Buildings, Dubuque, Dubuque County, IA

  12. 76 FR 13972 - United States Warehouse Act; Export Food Aid Commodities Licensing Agreement

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-15

    ... such as peas, beans and lentils. Current USWA licenses for agricultural products include grain, cotton, nuts, cottonseed, and dry beans. Warehouse operators that apply voluntarily agree to be licensed...

  13. 61. VIEW OF WAREHOUSE, MACHINE SHOP, AND HOISTING PLANT, WITH ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    61. VIEW OF WAREHOUSE, MACHINE SHOP, AND HOISTING PLANT, WITH FOREBAY ON RIGHT IN FOREGROUND, Prints No. 158 and 159, August 1903 - Electron Hydroelectric Project, Along Puyallup River, Electron, Pierce County, WA

  14. 5. AVALON DAM GATE KEEPER'S COMPLEX: HOUSE (LEFT), WAREHOUSE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    5. AVALON DAM - GATE KEEPER'S COMPLEX: HOUSE (LEFT), WAREHOUSE (RIGHT), AND CCC LANDSCAPING (FOREGROUND). VIEW TO SOUTHEAST - Carlsbad Irrigation District, Avalon Dam, On Pecos River, 4 miles North of Carlsbad, Carlsbad, Eddy County, NM

  15. Photographic copy of architectural drawings for Building 4332 (T82): Taylor ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Photographic copy of architectural drawings for Building 4332 (T-82): Taylor & Barnes, Architects & Engineers, 803 W. Third Street, Los Angeles California, O.C.E. Office of Civil Engineer Job No. Muroc ESA 210-48 and 210-49, Military Construction: Muroc Flight Test Base, Muroc, California, Warehouses and Additional Housing for Officers: Warehouse "A" Plans & Elevations, Sheet No. 4 of 16, May 1945. Reproduced from the holdings of the National Archives; Pacific Southwest Region - Edwards Air Force Base, North Base, Warehouse A, North Base Road at E Street, Boron, Kern County, CA

  16. Problem Areas in Data Warehousing and Data Mining in a Surgical Clinic

    PubMed Central

    Tusch, Guenter; Mueller, Margarete; Rohwer-Mensching, Katrin; Heiringhoff, Karlheinz; Klempnauer, Juergen

    2001-01-01

    Hospitals and clinics have taken advantage of information systems to streamline many clinical and administrative processes. However, the potential of health care information technology as a source of data for clinical and administrative decision support has not been fully explored. In response to pressure for timely information, many hospitals are developing clinical data warehouses. This paper attempts to identify problem areas in the process of developing a data warehouse to support data mining in surgery. Based on the experience from a data warehouse in surgery several solutions are discussed.

  17. Internet-based data warehousing

    NASA Astrophysics Data System (ADS)

    Boreisha, Yurii

    2001-10-01

    In this paper, we consider the process of the data warehouse creation and population using the latest Internet and database access technologies. The logical three-tier model is applied. This approach allows developing of an enterprise schema by analyzing the various processes in the organization, and extracting the relevant entities and relationships from them. Integration with local schemas and population of the data warehouse is done through the corresponding user, business, and data services components. The hierarchy of these components is used to hide from the data warehouse users the entire complex online analytical processing functionality.

  18. Managing data quality in an existing medical data warehouse using business intelligence technologies.

    PubMed

    Eaton, Scott; Ostrander, Michael; Santangelo, Jennifer; Kamal, Jyoti

    2008-11-06

    The Ohio State University Medical Center (OSUMC) Information Warehouse (IW) is a comprehensive data warehousing facility that provides providing data integration, management, mining, training, and development services to a diversity of customers across the clinical, education, and research sectors of the OSUMC. Providing accurate and complete data is a must for these purposes. In order to monitor the data quality of targeted data sets, an online scorecard has been developed to allow visualization of the critical measures of data quality in the Information Warehouse.

  19. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.

  20. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323

  1. Statewide Transportation Engineering Warehouse for Archived Regional Data (STEWARD).

    DOT National Transportation Integrated Search

    2009-12-01

    This report documents Phase III of the development and operation of a prototype for the Statewide Transportation : Engineering Warehouse for Archived Regional Data (STEWARD). It reflects the progress on the development and : operation of STEWARD sinc...

  2. 71. South El Paso St., 911 (commercial), east facade, warehouse ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    71. South El Paso St., 911 (commercial), east facade, warehouse to left in background - South El Paso Street Historic District, South El Paso, South Oregon & South Santa Fe Streets, El Paso, El Paso County, TX

  3. An organizational analysis of the Fulton Warehouse.

    DOT National Transportation Integrated Search

    1971-01-01

    The purpose of this study was to examine the Warehouse organization as an entity separate from the Purchasing Division. The examination focused on the organization internally and then on its relationships to other organizational structures. This stud...

  4. 6. AVALON DAM GATE KEEPER'S COMPLEX: GARAGE AND WAREHOUSE ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. AVALON DAM - GATE KEEPER'S COMPLEX: GARAGE AND WAREHOUSE (LEFT), HOUSE (RIGHT), AND CCC LANDSCAPING (FOREGROUND). VIEW TO NORTH - Carlsbad Irrigation District, Avalon Dam, On Pecos River, 4 miles North of Carlsbad, Carlsbad, Eddy County, NM

  5. 77 FR 61275 - Privacy Act of 1974: Implementation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-09

    ... (FBI) Privacy Act system of records titled FBI Data Warehouse System, JUSTICE/FBI- 022. This system is...)(G), (H), and (I), (5), and (8); (f); and (g) of the Privacy Act: (1) FBI Data Warehouse System...

  6. System and method for integrating and accessing multiple data sources within a data warehouse architecture

    DOEpatents

    Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA

    2006-12-19

    A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.

  7. Workplace exposures and protective practices of Hispanic warehouse workers.

    PubMed

    Livaudais, Jennifer C; Thompson, Beti; Islas, Ilda; Ibarra, Genoveva; Anderson, Jennifer; Coronado, Gloria D

    2009-04-01

    This study was undertaken to assess workplace hazards and protective practices among Hispanic men and women working post-harvest in asparagus, apple and pear packaging warehouses. Three focus groups were conducted in July 2003 with 25 workers (20 women, 5 men) recruited from communities in the Yakima Valley, Washington. Focus group content informed the design of an in-person structured interview administered to 50 additional warehouse workers from August to November 2006. Focus group participants reported difficult working conditions, exposure to chemicals, adverse health effects and use of work and home protective practices to minimize exposures for themselves and their families. Structured interview participants reported few workplace exposures to chemicals although many reported engaging in workplace and home protective practices. Findings from this research can direct initial efforts to determine if and how interventions for warehouse workers may be designed to protect against hazardous workplace exposures.

  8. Data Delivery and Mapping Over the Web: National Water-Quality Assessment Data Warehouse

    USGS Publications Warehouse

    Bell, Richard W.; Williamson, Alex K.

    2006-01-01

    The U.S. Geological Survey began its National Water-Quality Assessment (NAWQA) Program in 1991, systematically collecting chemical, biological, and physical water-quality data from study units (basins) across the Nation. In 1999, the NAWQA Program developed a data warehouse to better facilitate national and regional analysis of data from 36 study units started in 1991 and 1994. Data from 15 study units started in 1997 were added to the warehouse in 2001. The warehouse currently contains and links the following data: -- Chemical concentrations in water, sediment, and aquatic-organism tissues and related quality-control data from the USGS National Water Information System (NWIS), -- Biological data for stream-habitat and ecological-community data on fish, algae, and benthic invertebrates, -- Site, well, and basin information associated with thousands of descriptive variables derived from spatial analysis, like land use, soil, and population density, and -- Daily streamflow and temperature information from NWIS for selected sampling sites.

  9. Real Time Business Analytics for Buying or Selling Transaction on Commodity Warehouse Receipt System

    NASA Astrophysics Data System (ADS)

    Djatna, Taufik; Teniwut, Wellem A.; Hairiyah, Nina; Marimin

    2017-10-01

    The requirement for smooth information such as buying and selling is essential for commodity warehouse receipt system such as dried seaweed and their stakeholders to transact for an operational transaction. Transactions of buying or selling a commodity warehouse receipt system are a risky process due to the fluctuations in dynamic commodity prices. An integrated system to determine the condition of the real time was needed to make a decision-making transaction by the owner or prospective buyer. The primary motivation of this study is to propose computational methods to trace market tendency for either buying or selling processes. The empirical results reveal that feature selection gain ratio and k-NN outperforms other forecasting models, implying that the proposed approach is a promising alternative to the stock market tendency of warehouse receipt document exploration with accurate level rate is 95.03%.

  10. Digital patient: Personalized and translational data management through the MyHealthAvatar EU project.

    PubMed

    Kondylakis, Haridimos; Spanakis, Emmanouil G; Sfakianakis, Stelios; Sakkalis, Vangelis; Tsiknakis, Manolis; Marias, Kostas; Xia Zhao; Hong Qing Yu; Feng Dong

    2015-08-01

    The advancements in healthcare practice have brought to the fore the need for flexible access to health-related information and created an ever-growing demand for the design and the development of data management infrastructures for translational and personalized medicine. In this paper, we present the data management solution implemented for the MyHealthAvatar EU research project, a project that attempts to create a digital representation of a patient's health status. The platform is capable of aggregating several knowledge sources relevant for the provision of individualized personal services. To this end, state of the art technologies are exploited, such as ontologies to model all available information, semantic integration to enable data and query translation and a variety of linking services to allow connecting to external sources. All original information is stored in a NoSQL database for reasons of efficiency and fault tolerance. Then it is semantically uplifted through a semantic warehouse which enables efficient access to it. All different technologies are combined to create a novel web-based platform allowing seamless user interaction through APIs that support personalized, granular and secure access to the relevant information.

  11. Semantic web data warehousing for caGrid

    PubMed Central

    McCusker, James P; Phillips, Joshua A; Beltrán, Alejandra González; Finkelstein, Anthony; Krauthammer, Michael

    2009-01-01

    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG® Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges. PMID:19796399

  12. Tomato Expression Database (TED): a suite of data presentation and analysis tools

    PubMed Central

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150 000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at . PMID:16381976

  13. Tomato Expression Database (TED): a suite of data presentation and analysis tools.

    PubMed

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150,000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at http://ted.bti.cornell.edu.

  14. Any information, anywhere, anytime for the warfighter

    NASA Astrophysics Data System (ADS)

    Lazaroff, Mark B.; Sage, Philip A.

    1997-06-01

    The objective of the DARPA battlefield awareness data dissemination (BADD) program is to deliver battlefield awareness information to the warfighter -- anywhere, anytime. BADD is an advanced concept technology demonstration (ACTD) to support proof of concept technology demonstrations and experiments with a goal of introducing new technology to support the operational needs and acceptance of the warfighter. BADD's information management technology provides a 'smart' push of information to the users by providing information subscription services implemented via user- generated profiles. The system also provides services for warfighter pull or 'reach-back' of information via ad hoc query support. The high bandwidth delivery of informtion via the Global Broadcast System (GBS) satellites enables users to receive battlefield awareness information virtually anywhere. Very similar goals have been established for data warehousing technology -- that is, deliver the right information, to the right user, at the right time so that effective decisions can be made. In this paper, we examine the BADD Phase II architecture and underlying information management technoloyg in the context of data warehousing technology and a data warehouse reference architecture. In particular, we foucs on the BADD segment that PSR is building, the Interface to Information Sources (I2S).

  15. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    PubMed Central

    Sadesh, S.; Suganthe, R. C.

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626

  16. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  17. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  18. 6. VIEW TO WEST, SAMPLING BUILDING, MECHANIC SHED, MILL WAREHOUSE, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. VIEW TO WEST, SAMPLING BUILDING, MECHANIC SHED, MILL WAREHOUSE, DRYERS, AND GRINDING/ROD MILL. - Vanadium Corporation of America (VCA) Naturita Mill, 3 miles Northwest of Naturita, between Highway 141 & San Miguel River, Naturita, Montrose County, CO

  19. 7. VIEW TO EAST, MILL WAREHOUSE, DRYERS, GRINDING/ROD MILL, AND ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    7. VIEW TO EAST, MILL WAREHOUSE, DRYERS, GRINDING/ROD MILL, AND MECHANIC SHED. - Vanadium Corporation of America (VCA) Naturita Mill, 3 miles Northwest of Naturita, between Highway 141 & San Miguel River, Naturita, Montrose County, CO

  20. 7 CFR 1205.500 - Terms defined.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE COTTON RESEARCH AND PROMOTION Cotton... administrative body established pursuant to the Cotton Research and Promotion Order. (c) CCC means the Commodity... research and promotion assessment, picking charges, ginning charges, warehouse receiving charges, warehouse...

  1. 7 CFR 1205.500 - Terms defined.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE COTTON RESEARCH AND PROMOTION Cotton... administrative body established pursuant to the Cotton Research and Promotion Order. (c) CCC means the Commodity... research and promotion assessment, picking charges, ginning charges, warehouse receiving charges, warehouse...

  2. BOILING HOUSE, GROUND FLOOR. WAREHOUSE TO LEFT REAR, MASSECUITTE HEATERS ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    BOILING HOUSE, GROUND FLOOR. WAREHOUSE TO LEFT REAR, MASSECUITTE HEATERS ABOVE RIGHT, LOW GRADE CENTRIFUGALS BELOW. CRYSTALLIZER HOT WATER TANK TO REAR. VIEW FROM NORTHEAST - Lihue Plantation Company, Sugar Mill Building, Haleko Road, Lihue, Kauai County, HI

  3. 19 CFR 19.20 - Withdrawal of products from bonded smelting or refining warehouses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... to another bonded warehouse shall be at the risk and expense of the applicant, and the general... far as applicable. (2) In the case of transportation to another port, the transportation entry shall...

  4. 76 FR 3502 - Time for Payment of Certain Excise Taxes, and Quarterly Excise Tax Payments for Small Alcohol...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-01-20

    ... requirements, Research, Security measures, Surety bonds, Vinegar, Virgin Islands, Warehouses. 27 CFR Part 24..., Research, Scientific equipment, Spices and flavorings, Surety bonds, Vinegar, Warehouses, Wine. 27 CFR Part...

  5. A novel content-based medical image retrieval method based on query topic dependent image features (QTDIF)

    NASA Astrophysics Data System (ADS)

    Xiong, Wei; Qiu, Bo; Tian, Qi; Mueller, Henning; Xu, Changsheng

    2005-04-01

    Medical image retrieval is still mainly a research domain with a large variety of applications and techniques. With the ImageCLEF 2004 benchmark, an evaluation framework has been created that includes a database, query topics and ground truth data. Eleven systems (with a total of more than 50 runs) compared their performance in various configurations. The results show that there is not any one feature that performs well on all query tasks. Key to successful retrieval is rather the selection of features and feature weights based on a specific set of input features, thus on the query task. In this paper we propose a novel method based on query topic dependent image features (QTDIF) for content-based medical image retrieval. These feature sets are designed to capture both inter-category and intra-category statistical variations to achieve good retrieval performance in terms of recall and precision. We have used Gaussian Mixture Models (GMM) and blob representation to model medical images and construct the proposed novel QTDIF for CBIR. Finally, trained multi-class support vector machines (SVM) are used for image similarity ranking. The proposed methods have been tested over the Casimage database with around 9000 images, for the given 26 image topics, used for imageCLEF 2004. The retrieval performance has been compared with the medGIFT system, which is based on the GNU Image Finding Tool (GIFT). The experimental results show that the proposed QTDIF-based CBIR can provide significantly better performance than systems based general features only.

  6. Improving average ranking precision in user searches for biomedical research datasets

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick

    2017-01-01

    Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw conclusive results. Database URL: https://biocaddie.org/benchmark-data PMID:29220475

  7. A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

    PubMed

    Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-10-08

    Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.

  8. Column compression strength of tubular packaging forms made from paper

    Treesearch

    Thomas J. Urbanik; Sung K. Lee; Charles G. Johnson

    2006-01-01

    Tubular packaging forms fabricated and shaped from rolled paper are used as reinforcing corner posts for major appliances packaged in corrugated containers. Tests of column compression strength simulate the expected performance loads from appliances stacked in warehouses. Column strength depends on tube geometry, paper properties, basis weight, and number of...

  9. SysBioCube: A Data Warehouse and Integrative Data Analysis Platform Facilitating Systems Biology Studies of Disorders of Military Relevance

    DTIC Science & Technology

    2013-12-18

    include interactive gene and methylation profiles, interactive heatmaps, cytoscape network views, integrative genomics viewer ( IGV ), and protein-protein...single chart. The website also provides an option to include multiple genes. Integrative Genomics Viewer ( IGV )1, is a high-performance desktop tool for

  10. Using business intelligence to monitor clinical quality metrics.

    PubMed

    Resetar, Ervina; Noirot, Laura A; Reichley, Richard M; Storey, Patricia; Skiles, Ann M; Traynor, Patrick; Dunagan, W Claiborne; Bailey, Thomas C

    2007-10-11

    BJC HealthCare (BJC) uses a number of industry standard indicators to monitor the quality of services provided by each of its hospitals. By establishing an enterprise data warehouse as a central repository of clinical quality information, BJC is able to monitor clinical quality performance in a timely manner and improve clinical outcomes.

  11. Real-Time Data Warehousing and On-Line Analytical Processing at Aberdeen Test Center’s Distributed Center

    DTIC Science & Technology

    2005-12-01

    data collected via on-board instrumentation -VxWorks based computer. Each instrument produces a continuous time history record of up to 250...data in multidimensional hierarchies and views. UGC 2005 Institute a high performance data warehouse • PostgreSQL 7.4 installed on dedicated filesystem

  12. Profiling Students Who Take Online Courses Using Data Mining Methods

    ERIC Educational Resources Information Center

    Yu, Chong Ho; Digangi, Samuel; Jannasch-Pennell, Angel Kay; Kaprolet, Charles

    2008-01-01

    The efficacy of online learning programs is tied to the suitability of the program in relation to the target audience. Based on the dataset that provides information on student enrollment, academic performance, and demographics extracted from a data warehouse of a large Southwest institution, this study explored the factors that could distinguish…

  13. Context-Aware Online Commercial Intention Detection

    NASA Astrophysics Data System (ADS)

    Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng

    With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.

  14. Under EPA Settlement, Chicopee, Mass. Cold Storage Warehouse Company Improves Public Protections

    EPA Pesticide Factsheets

    A Chicopee, Mass., company that operates a cold storage warehouse is spending more than half a million dollars, primarily on public safety enhancements, to resolve claims it violated the federal Clean Air Act's chemical release prevention requirements...

  15. 76 FR 57719 - Procurement List; Additions

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-16

    ... Creek Recreation Area, 3211 Reservoir Road, Walla Walla, WA. NPA: Lillie Rice Center, Walla Walla, WA...: Warehouse Staffing Services, Warehouse Section--Building Branch--NOAA's Logistics Div., Building 22, 325... Desk (Call Center) Service, Defense Logistics Agency, Fort Belvoir, VA. (Offsite: 2511 Martin Luther...

  16. 119. NORTH PLANT GB WAREHOUSE (BUILDING 1607), WITH DISCHARGED TON ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    119. NORTH PLANT GB WAREHOUSE (BUILDING 1607), WITH DISCHARGED TON CONTAINERS IN FOREGROUND. VIEW TO SOUTHEAST. - Rocky Mountain Arsenal, Bounded by Ninety-sixth Avenue & Fifty-sixth Avenue, Buckley Road, Quebec Street & Colorado Highway 2, Commerce City, Adams County, CO

  17. 29. 4TH STREET FROM NEAR ITS INTERSECTION WITH J STREET, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    29. 4TH STREET FROM NEAR ITS INTERSECTION WITH J STREET, LOOKING NORTH, WITH WAREHOUSE 333 AT LEFT AND WAREHOUSES 433, 432 & 431 AT RIGHT. - Oakland Naval Supply Center, Maritime Street at Seventh Street, Oakland, Alameda County, CA

  18. PRMS Data Warehousing Prototype

    NASA Technical Reports Server (NTRS)

    Guruvadoo, Eranna K.

    2001-01-01

    Project and Resource Management System (PRMS) is a web-based, mid-level management tool developed at KSC to provide a unified enterprise framework for Project and Mission management. The addition of a data warehouse as a strategic component to the PRMS is investigated through the analysis design and implementation processes of a data warehouse prototype. As a proof of concept, a demonstration of the prototype with its OLAP's technology for multidimensional data analysis is made. The results of the data analysis and the design constraints are discussed. The prototype can be used to motivate interest and support for an operational data warehouse.

  19. PRMS Data Warehousing Prototype

    NASA Technical Reports Server (NTRS)

    Guruvadoo, Eranna K.

    2002-01-01

    Project and Resource Management System (PRMS) is a web-based, mid-level management tool developed at KSC to provide a unified enterprise framework for Project and Mission management. The addition of a data warehouse as a strategic component to the PRMS is investigated through the analysis, design and implementation processes of a data warehouse prototype. As a proof of concept, a demonstration of the prototype with its OLAP's technology for multidimensional data analysis is made. The results of the data analysis and the design constraints are discussed. The prototype can be used to motivate interest and support for an operational data warehouse.

  20. Data warehousing as a healthcare business solution.

    PubMed

    Scheese, R

    1998-02-01

    Because of the trend toward consolidation in the healthcare field, many organizations have massive amounts of data stored in various information systems organizationwide, but access to the data by end users may be difficult. Healthcare organizations are being pressured to provide managers easy access to the data needed for critical decision making. One solution many organizations are turning to is implementing decision-support data warehouses. A data warehouse instantly delivers information directly to end users, freeing healthcare information systems staff for strategic operations. If designed appropriately, data warehouses can be a cost-effective tool for business analysis and decision support.

Top