The Research on Automatic Construction of Domain Model Based on Deep Web Query Interfaces
NASA Astrophysics Data System (ADS)
JianPing, Gu
The integration of services is transparent, meaning that users no longer face the millions of Web services, do not care about the required data stored, but do not need to learn how to obtain these data. In this paper, we analyze the uncertainty of schema matching, and then propose a series of similarity measures. To reduce the cost of execution, we propose the type-based optimization method and schema matching pruning method of numeric data. Based on above analysis, we propose the uncertain schema matching method. The experiments prove the effectiveness and efficiency of our method.
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
Hybrid Schema Matching for Deep Web
NASA Astrophysics Data System (ADS)
Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng
Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.
Gstruct: a system for extracting schemas from GML documents
NASA Astrophysics Data System (ADS)
Chen, Hui; Zhu, Fubao; Guan, Jihong; Zhou, Shuigeng
2008-10-01
Geography Markup Language (GML) becomes the de facto standard for geographic information representation on the internet. GML schema provides a way to define the structure, content, and semantic of GML documents. It contains useful structural information of GML documents and plays an important role in storing, querying and analyzing GML data. However, GML schema is not mandatory, and it is common that a GML document contains no schema. In this paper, we present Gstruct, a tool for GML schema extraction. Gstruct finds the features in the input GML documents, identifies geometry datatypes as well as simple datatypes, then integrates all these features and eliminates improper components to output the optimal schema. Experiments demonstrate that Gstruct is effective in extracting semantically meaningful schemas from GML documents.
2013-01-01
Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556
Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes
NASA Astrophysics Data System (ADS)
Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel
RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.
García-Remesal, M; Maojo, V; Billhardt, H; Crespo, J
2010-01-01
Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since most relevant biomedical sources belong to one of these categories. In this paper we evaluate the feasibility of integrating relational and text-based biomedical sources using: i) an original logical schema acquisition method for textual databases developed by the authors, and ii) OntoFusion, a system originally designed by the authors for the integration of relational sources. We conducted an integration experiment involving a test set of seven differently structured sources covering the domain of genetic diseases. We used our logical schema acquisition method to generate schemas for all textual sources. The sources were integrated using the methods and tools provided by OntoFusion. The integration was validated using a test set of 500 queries. A panel of experts answered a questionnaire to evaluate i) the quality of the extracted schemas, ii) the query processing performance of the integrated set of sources, and iii) the relevance of the retrieved results. The results of the survey show that our method extracts coherent and representative logical schemas. Experts' feedback on the performance of the integrated system and the relevance of the retrieved results was also positive. Regarding the validation of the integration, the system successfully provided correct results for all queries in the test set. The results of the experiment suggest that text-based sources including a logical schema can be regarded as equivalent to structured databases. Using our method, previous research and existing tools designed for the integration of structured databases can be reused - possibly subject to minor modifications - to integrate differently structured sources.
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
A natural language interface plug-in for cooperative query answering in biological databases.
Jamil, Hasan M
2012-06-11
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
In-context query reformulation for failing SPARQL queries
NASA Astrophysics Data System (ADS)
Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James
2017-05-01
Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.
HodDB: Design and Analysis of a Query Processor for Brick.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fierro, Gabriel; Culler, David
Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less
2017-01-01
Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping). PMID:29065644
Zhang, Yinsheng; Zhang, Guoming; Shang, Qian
2017-01-01
Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).
Parsing GML data based on integrative GML syntactic and semantic schemas database
NASA Astrophysics Data System (ADS)
Miao, Lizhi; Zhang, Shuliang; Lu, Guonian; Gao, Xiaoli; Jiao, Donglai; Gan, Jiayan
2007-06-01
This paper proposes a new method to parse various application schemas of Geography Markup Language (GML) for understanding syntax and semantic of their element and type in order to implement uniform interpretation of the same GML instance data among diverse users. The proposed method generates an Integrative GML Syntactic and Semantic Schemas Database (IGSSSDB) from GML3.1 core schemas and corresponding application schema. This paper parses GML data based on IGSSSDB, which is composed of syntactic and semantic information, nesting information and mapping rules of GML core schemas and application schemas. Three kinds of relational tables are designed for storing information from schemas when constructing IGSSSDB. Those are info tables for schemas included and namespace imported in application schemas, tables for information related to schemas and catalog tables of core schemas. In relational tables, we propose to use homologous regular expression to describe model of elements and complex types in schemas, which can ensure model complete and readable. Based on IGSSSDB, we design and develop many APIs to implement GML data parsing, and can process syntactic and semantic information of GML data from diverse fields and users. At the latter part of this paper, test study is implemented to show that the proposed method is feasible and appropriate for parsing GML data. Also, it founds a good basis for future GML data studies such as storage, index and query etc.
NASA Astrophysics Data System (ADS)
Curland, Matthew; Halpin, Terry; Stirewalt, Kurt
A conceptual schema of an information system specifies the fact structures of interest as well as related business rules that are either constraints or derivation rules. Constraints restrict the possible or permitted states or state transitions, while derivation rules enable some facts to be derived from others. Graphical languages are commonly used to specify conceptual schemas, but often need to be supplemented by more expressive textual languages to capture additional business rules, as well as conceptual queries that enable conceptual models to be queried directly. This paper describes research to provide a role calculus to underpin textual languages for Object-Role Modeling (ORM), to enable business rules and queries to be formulated in a language intelligible to business users. The role-based nature of this calculus, which exploits the attribute-free nature of ORM, appears to offer significant advantages over other proposed approaches, especially in the area of semantic stability.
A Split-Path Schema-Based RFID Data Storage Model in Supply Chain Management
Fan, Hua; Wu, Quanyuan; Lin, Yisong; Zhang, Jianfeng
2013-01-01
In modern supply chain management systems, Radio Frequency IDentification (RFID) technology has become an indispensable sensor technology and massive RFID data sets are expected to become commonplace. More and more space and time are needed to store and process such huge amounts of RFID data, and there is an increasing realization that the existing approaches cannot satisfy the requirements of RFID data management. In this paper, we present a split-path schema-based RFID data storage model. With a data separation mechanism, the massive RFID data produced in supply chain management systems can be stored and processed more efficiently. Then a tree structure-based path splitting approach is proposed to intelligently and automatically split the movement paths of products. Furthermore, based on the proposed new storage model, we design the relational schema to store the path information and time information of tags, and some typical query templates and SQL statements are defined. Finally, we conduct various experiments to measure the effect and performance of our model and demonstrate that it performs significantly better than the baseline approach in both the data expression and path-oriented RFID data query performance. PMID:23645112
Yan, Xu; Zhou, Minxiong; Ying, Lingfang; Yin, Dazhi; Fan, Mingxia; Yang, Guang; Zhou, Yongdi; Song, Fan; Xu, Dongrong
2013-01-01
Diffusion kurtosis imaging (DKI) is a new method of magnetic resonance imaging (MRI) that provides non-Gaussian information that is not available in conventional diffusion tensor imaging (DTI). DKI requires data acquisition at multiple b-values for parameter estimation; this process is usually time-consuming. Therefore, fewer b-values are preferable to expedite acquisition. In this study, we carefully evaluated various acquisition schemas using different numbers and combinations of b-values. Acquisition schemas that sampled b-values that were distributed to two ends were optimized. Compared to conventional schemas using equally spaced b-values (ESB), optimized schemas require fewer b-values to minimize fitting errors in parameter estimation and may thus significantly reduce scanning time. Following a ranked list of optimized schemas resulted from the evaluation, we recommend the 3b schema based on its estimation accuracy and time efficiency, which needs data from only 3 b-values at 0, around 800 and around 2600 s/mm2, respectively. Analyses using voxel-based analysis (VBA) and region-of-interest (ROI) analysis with human DKI datasets support the use of the optimized 3b (0, 1000, 2500 s/mm2) DKI schema in practical clinical applications. PMID:23735303
The Star Schema Benchmark and Augmented Fact Table Indexing
NASA Astrophysics Data System (ADS)
O'Neil, Patrick; O'Neil, Elizabeth; Chen, Xuedong; Revilak, Stephen
We provide a benchmark measuring star schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2’s Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a star schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.
Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L
2000-01-01
The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
ESTminer: a Web interface for mining EST contig and cluster databases.
Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R
2005-03-01
ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.
An XML-Based Knowledge Management System of Port Information for U.S. Coast Guard Cutters
2003-03-01
using DTDs was not chosen. XML Schema performs many of the same functions as SQL type schemas, but differ by the unique structure of XML documents...to access data from content files within the developed system. XPath is not equivalent to SQL . While XPath is very powerful at reaching into an XML...document and finding nodes or node sets, it is not a complete query language. For operations like joins, unions, intersections, etc., SQL is far
An advanced web query interface for biological databases
Latendresse, Mario; Karp, Peter D.
2010-01-01
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.
2008-01-01
A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.
EasyKSORD: A Platform of Keyword Search Over Relational Databases
NASA Astrophysics Data System (ADS)
Peng, Zhaohui; Li, Jing; Wang, Shan
Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
The Schema.org Datasets Schema: Experiences at the National Snow and Ice Data Center
NASA Astrophysics Data System (ADS)
Duerr, R.; Billingsley, B. W.; Harper, D.; Kovarik, J.
2014-12-01
Data discovery, is still a major challenge for many users. Relevant data may be located anywhere. There are currently no existing universal data registries. Often users start with a simple query through their web browser. But how do you get your data to actually show up near the top of the results? One relatively new way to accomplish this is to use schema.org dataset markup in your data pages. Theoretically this provides web crawlers the additional information needed so that a query for data will preferentially return those pages that were marked up accordingly. The National Snow and Ice Data Center recently implemented an initial set of markup in the data set pages returned by its catalog. The Datasets data model, our process, challenges encountered and results will be described.
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
NASA Astrophysics Data System (ADS)
Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee
2010-04-01
The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Computer systems and methods for the query and visualization of multidimensional databases
Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick
2006-08-08
A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.
Computer systems and methods for the query and visualization of multidimensional database
Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick
2010-05-11
A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.
A JEE RESTful service to access Conditions Data in ATLAS
NASA Astrophysics Data System (ADS)
Formica, Andrea; Gallas, E. J.
2015-12-01
Usage of condition data in ATLAS is extensive for offline reconstruction and analysis (e.g. alignment, calibration, data quality). The system is based on the LCG Conditions Database infrastructure, with read and write access via an ad hoc C++ API (COOL), a system which was developed before Run 1 data taking began. The infrastructure dictates that the data is organized into separate schemas (assigned to subsystems/groups storing distinct and independent sets of conditions), making it difficult to access information from several schemas at the same time. We have thus created PL/SQL functions containing queries to provide content extraction at multi-schema level. The PL/SQL API has been exposed to external clients by means of a Java application providing DB access via REST services, deployed inside an application server (JBoss WildFly). The services allow navigation over multiple schemas via simple URLs. The data can be retrieved either in XML or JSON formats, via simple clients (like curl or Web browsers).
Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling
2005-01-01
Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.
SchemaOnRead: A Package for Schema-on-Read in R
DOE Office of Scientific and Technical Information (OSTI.GOV)
North, Michael J.
Schema-on-read is an agile approach to data storage and retrieval that defers investments in data organization until production queries need to be run by working with data directly in native form. Schema-on-read functions have been implemented in a wide range of analytical systems, most notably Hadoop. SchemaOnRead is a CRAN package that uses R’s flexible data representations to provide transparent and convenient support for the schema-on-read paradigm in R. The schema-on- read tools within the package include a single function call that recursively reads folders with text, comma separated value, raster image, R data, HDF5, NetCDF, spreadsheet, Weka, Epi Info,more » Pajek network, R network, HTML, SPSS, Systat, and Stata files. The provided tools can be used as-is or easily adapted to implement customized schema-on-read tool chains in R. This paper’s contribution is that it introduces and describes SchemaOnRead, the first R package specifically focused on providing explicit schema-on-read support in R.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
North, Michael J.
Schema-on-read is an agile approach to data storage and retrieval that defers investments in data organization until production queries need to be run by working with data directly in native form. Schema-on-read functions have been implemented in a wide range of analytical systems, most notably Hadoop. SchemaOnRead is a CRAN package that uses R’s flexible data representations to provide transparent and convenient support for the schema-on-read paradigm in R. The schema-on- read tools within the package include a single function call that recursively reads folders with text, comma separated value, raster image, R data, HDF5, NetCDF, spreadsheet, Weka, Epi Info,more » Pajek network, R network, HTML, SPSS, Systat, and Stata files. The provided tools can be used as-is or easily adapted to implement customized schema-on-read tool chains in R. This paper’s contribution is that it introduces and describes SchemaOnRead, the first R package specifically focused on providing explicit schema-on-read support in R.« less
'Big Data' Collaboration: Exploring, Recording and Sharing Enterprise Knowledge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R; Ferrell, Regina Kay
2013-01-01
As data sources and data size proliferate, knowledge discovery from "Big Data" is starting to pose several challenges. In this paper, we address a specific challenge in the practice of enterprise knowledge management while extracting actionable nuggets from diverse data sources of seemingly-related information. In particular, we address the challenge of archiving knowledge gained through collaboration, dissemination and visualization as part of the data analysis, inference and decision-making lifecycle. We motivate the implementation of an enterprise data-discovery and knowledge recorder tool, called SEEKER based on real world case-study. We demonstrate SEEKER capturing schema and data-element relationships, tracking the data elementsmore » of value based on the queries and the analytical artifacts that are being created by analysts as they use the data. We show how the tool serves as digital record of institutional domain knowledge and a documentation for the evolution of data elements, queries and schemas over time. As a knowledge management service, a tool like SEEKER saves enterprise resources and time by avoiding analytic silos, expediting the process of multi-source data integration and intelligently documenting discoveries from fellow analysts.« less
NASA Astrophysics Data System (ADS)
Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.
2014-05-01
Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).
Integration of Schemas on the Pre-Design Level Using the KCPM-Approach
NASA Astrophysics Data System (ADS)
Vöhringer, Jürgen; Mayr, Heinrich C.
Integration is a central research and operational issue in information system design and development. It can be conducted on the system, schema, and view or data level. On the system level, integration deals with the progressive linking and testing of system components to merge their functional and technical characteristics and behavior into a comprehensive, interoperable system. Schema integration comprises the comparison and merging of two or more schemas, usually conceptual database schemas. The integration of data deals with merging the contents of multiple sources of related data. View integration is similar to schema integration, however focuses on views and queries on these instead of schemas. All these types of integration have in common, that two or more sources are merged and previously compared, in order to identify matches and mismatches as well as conflicts and inconsistencies. The sources may stem from heterogeneous companies, organizational units or projects. Integration enables the reuse and combined use of source components.
A Dimensional Bus model for integrating clinical and research data.
Wade, Ted D; Hum, Richard C; Murphy, James R
2011-12-01
Many clinical research data integration platforms rely on the Entity-Attribute-Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Borrowing concepts from Entity-Attribute-Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. The design provides a workable way to manage and query mixed schemas in a data warehouse.
NASA Astrophysics Data System (ADS)
Willmes, C.
2017-12-01
In the frame of the Collaborative Research Centre 806 (CRC 806) an interdisciplinary research project, that needs to manage data, information and knowledge from heterogeneous domains, such as archeology, cultural sciences, and the geosciences, a collaborative internal knowledge base system was developed. The system is based on the open source MediaWiki software, that is well known as the software that enables Wikipedia, for its facilitation of a web based collaborative knowledge and information management platform. This software is additionally enhanced with the Semantic MediaWiki (SMW) extension, that allows to store and manage structural data within the Wiki platform, as well as it facilitates complex query and API interfaces to the structured data stored in the SMW data base. Using an additional open source software called mobo, it is possible to improve the data model development process, as well as automated data imports, from small spreadsheets to large relational databases. Mobo is a command line tool that helps building and deploying SMW structure in an agile, Schema-Driven Development way, and allows to manage and collaboratively develop the data model formalizations, that are formalized in JSON-Schema format, using version control systems like git. The combination of a well equipped collaborative web platform facilitated by Mediawiki, the possibility to store and query structured data in this collaborative database provided by SMW, as well as the possibility for automated data import and data model development enabled by mobo, result in a powerful but flexible system to build and develop a collaborative knowledge base system. Furthermore, SMW allows the application of Semantic Web technology, the structured data can be exported into RDF, thus it is possible to set a triple-store including a SPARQL endpoint on top of the database. The JSON-Schema based data models, can be enhanced into JSON-LD, to facilitate and profit from the possibilities of Linked Data technology.
Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S
2018-06-01
Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.
Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng
2018-01-01
Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...
2017-11-06
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
Implementation of a Distributed Object-Oriented Database Management System
1989-03-01
and heuristic algorithms. A method for determining ueit allocation by splitting relations in the conceptual schema base on queries and updates is...level framworks can provide to the user the appearance of many tools to be closely integrated. In particular, the KBSA tools use many high level...development process should begin first with conceptual design of the system. Approximately one month should be used to decide how the new projects
A Dimensional Bus model for integrating clinical and research data
Hum, Richard C; Murphy, James R
2011-01-01
Objectives Many clinical research data integration platforms rely on the Entity–Attribute–Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Materials and Methods Borrowing concepts from Entity–Attribute–Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. Results The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. Conclusion The design provides a workable way to manage and query mixed schemas in a data warehouse. PMID:21856687
NASA Astrophysics Data System (ADS)
Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros
SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
An approach for heterogeneous and loosely coupled geospatial data distributed computing
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui
2010-07-01
Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.
Computer systems and methods for the query and visualization of multidimensional databases
Stolte, Chris; Tang, Diane L; Hanrahan, Patrick
2015-03-03
A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes multiple operand names, each operand corresponding to one or more fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first operands with the columns shelf and to associate one or more second operands with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first operands, and each pane has a y-axis defined based on data for the one or more second operands.
Computer systems and methods for the query and visualization of multidimensional databases
Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick
2015-11-10
A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes a plurality of fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first fields with the columns shelf and to associate one or more second fields with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first fields, and each pane has a y-axis defined based on data for the one or more second fields.
Development of Korean Rare Disease Knowledge Base
Seo, Heewon; Kim, Dokyoon; Chae, Jong-Hee; Kang, Hee Gyung; Lim, Byung Chan; Cheong, Hae Il
2012-01-01
Objectives Rare disease research requires a broad range of disease-related information for the discovery of causes of genetic disorders that are maladies caused by abnormalities in genes or chromosomes. A rarity in cases makes it difficult for researchers to elucidate definite inception. This knowledge base will be a major resource not only for clinicians, but also for the general public, who are unable to find consistent information on rare diseases in a single location. Methods We design a compact database schema for faster querying; its structure is optimized to store heterogeneous data sources. Then, clinicians at Seoul National University Hospital (SNUH) review and revise those resources. Additionally, we integrated other sources to capture genomic resources and clinical trials in detail on the Korean Rare Disease Knowledge base (KRDK). Results As a result, we have developed a Web-based knowledge base, KRDK, suitable for study of Mendelian diseases that commonly occur among Koreans. This knowledge base is comprised of disease summary and review, causal gene list, laboratory and clinic directory, patient registry, and so on. Furthermore, database for analyzing and giving access to human biological information and the clinical trial management system are integrated on KRDK. Conclusions We expect that KRDK, the first rare disease knowledge base in Korea, may contribute to collaborative research and be a reliable reference for application to clinical trials. Additionally, this knowledge base is ready for querying of drug information so that visitors can search a list of rare diseases that is relative to specific drugs. Visitors can have access to KRDK via http://www.snubi.org/software/raredisease/. PMID:23346478
Heterogenous database integration in a physician workstation.
Annevelink, J; Young, C Y; Tang, P C
1991-01-01
We discuss the integration of a variety of data and information sources in a Physician Workstation (PWS), focusing on the integration of data from DHCP, the Veteran Administration's Distributed Hospital Computer Program. We designed a logically centralized, object-oriented data-schema, used by end users and applications to explore the data accessible through an object-oriented database using a declarative query language. We emphasize the use of procedural abstraction to transparently integrate a variety of information sources into the data schema.
Heterogenous database integration in a physician workstation.
Annevelink, J.; Young, C. Y.; Tang, P. C.
1991-01-01
We discuss the integration of a variety of data and information sources in a Physician Workstation (PWS), focusing on the integration of data from DHCP, the Veteran Administration's Distributed Hospital Computer Program. We designed a logically centralized, object-oriented data-schema, used by end users and applications to explore the data accessible through an object-oriented database using a declarative query language. We emphasize the use of procedural abstraction to transparently integrate a variety of information sources into the data schema. PMID:1807624
NASA Astrophysics Data System (ADS)
Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG
2018-01-01
Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.
Mynodbcsv: lightweight zero-config database solution for handling very large CSV files.
Adaszewski, Stanisław
2014-01-01
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach--data stay mostly in the CSV files; "zero configuration"--no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files
Adaszewski, Stanisław
2014-01-01
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: “no copy” approach – data stay mostly in the CSV files; “zero configuration” – no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results. PMID:25068261
samiDB: A Prototype Data Archive for Big Science Exploration
NASA Astrophysics Data System (ADS)
Konstantopoulos, I. S.; Green, A. W.; Cortese, L.; Foster, C.; Scott, N.
2015-04-01
samiDB is an archive, database, and query engine to serve the spectra, spectral hypercubes, and high-level science products that make up the SAMI Galaxy Survey. Based on the versatile Hierarchical Data Format (HDF5), samiDB does not depend on relational database structures and hence lightens the setup and maintenance load imposed on science teams by metadata tables. The code, written in Python, covers the ingestion, querying, and exporting of data as well as the automatic setup of an HTML schema browser. samiDB serves as a maintenance-light data archive for Big Science and can be adopted and adapted by science teams that lack the means to hire professional archivists to set up the data back end for their projects.
Multi-field query expansion is effective for biomedical dataset retrieval.
Bouadjenek, Mohamed Reda; Verspoor, Karin
2017-01-01
In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.
Multi-field query expansion is effective for biomedical dataset retrieval
2017-01-01
Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457
The StarView intelligent query mechanism
NASA Technical Reports Server (NTRS)
Semmel, R. D.; Silberberg, D. P.
1993-01-01
The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.
Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences
Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi
2006-01-01
Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384
Family Functioning and Maladaptive Schemas: The Moderating Effects of Optimism
ERIC Educational Resources Information Center
Buri, John R.; Gunty, Amy L.
2008-01-01
Authoritarian parenting is often shown to be associated with negative outcomes for children, including the development of maladaptive schemas. However, this is not the case for all children who experience Authoritarian parenting. Optimism is examined as a moderator in the relationship between Authoritarian parenting and maladaptive schemas that…
Ontology based heterogeneous materials database integration and semantic query
NASA Astrophysics Data System (ADS)
Zhao, Shuai; Qian, Quan
2017-10-01
Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.
Medical and Transmission Vector Vocabulary Alignment with Schema.org
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, William P.; Chappell, Alan R.; Corley, Courtney D.
Available biomedical ontologies and knowledge bases currently lack formal and standards-based interconnections between disease, disease vector, and drug treatment vocabularies. The PNNL Medical Linked Dataset (PNNL-MLD) addresses this gap. This paper describes the PNNL-MLD, which provides a unified vocabulary and dataset of drug, disease, side effect, and vector transmission background information. Currently, the PNNL-MLD combines and curates data from the following research projects: DrugBank, DailyMed, Diseasome, DisGeNet, Wikipedia Infobox, Sider, and PharmGKB. The main outcomes of this effort are a dataset aligned to Schema.org, including a parsing framework, and extensible hooks ready for integration with selected medical ontologies. The PNNL-MLDmore » enables researchers more quickly and easily to query distinct datasets. Future extensions to the PNNL-MLD will include Traditional Chinese Medicine, broader interlinks across genetic structures, a larger thesaurus of synonyms and hypernyms, explicit coding of diseases and drugs across research systems, and incorporating vector-borne transmission vocabularies.« less
Astronomical Data Integration Beyond the Virtual Observatory
NASA Astrophysics Data System (ADS)
Lemson, G.; Laurino, O.
2015-09-01
"Data integration" generally refers to the process of combining data from different source data bases into a unified view. Much work has been devoted in this area by the International Virtual Observatory Alliance (IVOA), allowing users to discover and access databases through standard protocols. However, different archives present their data through their own schemas and users must still select, filter, and combine data for each archive individually. An important reason for this is that the creation of common data models that satisfy all sub-disciplines is fraught with difficulties. Furthermore it requires a substantial amount of work for data providers to present their data according to some standard representation. We will argue that existing standards allow us to build a data integration framework that works around these problems. The particular framework requires the implementation of the IVOA Table Access Protocol (TAP) only. It uses the newly developed VO data modelling language (VO-DML) specification, which allows one to define extensible object-oriented data models using a subset of UML concepts through a simple XML serialization language. A rich mapping language allows one to describe how instances of VO-DML data models are represented by the TAP service, bridging the possible mismatch between a local archive's schema and some agreed-upon representation of the astronomical domain. In this so called local-as-view approach to data integration, “mediators" use the mapping prescriptions to translate queries phrased in terms of the common schema to the underlying TAP service. This mapping language has a graphical representation, which we expose through a web based graphical “drag-and-drop-and-connect" interface. This service allows any user to map the holdings of any TAP service to the data model(s) of choice. The mappings are defined and stored outside of the data sources themselves, which allows the interface to be used in a kind of crowd-sourcing effort to annotate any remote database of interest. This reduces the burden of publishing one's data and allows a great flexibility in the definition of the views through which particular communities might wish to access remote archives. At the same time, the framework easies the user's effort to select, filter, and combine data from many different archives, so as to build knowledge bases for their analysis. We will present the framework and demonstrate a prototype implementation. We will discuss ideas for producing the missing elements, in particular the query language and the implementation of mediator tools to translate object queries to ADQL
Information Network Model Query Processing
NASA Astrophysics Data System (ADS)
Song, Xiaopu
Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.
Distributed query plan generation using multiobjective genetic algorithm.
Panicker, Shina; Kumar, T V Vijay
2014-01-01
A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.
Distributed Query Plan Generation Using Multiobjective Genetic Algorithm
Panicker, Shina; Vijay Kumar, T. V.
2014-01-01
A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513
Graph Mining Meets the Semantic Web
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan
The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less
Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta
2017-01-01
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic. PMID:28245222
Wu, Jibing; Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta
2017-01-01
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic.
Manchester visual query language
NASA Astrophysics Data System (ADS)
Oakley, John P.; Davis, Darryl N.; Shann, Richard T.
1993-04-01
We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.
Performance Prediction of a MongoDB-Based Traceability System in Smart Factory Supply Chains
Kang, Yong-Shin; Park, Il-Ha; Youm, Sekyoung
2016-01-01
In the future, with the advent of the smart factory era, manufacturing and logistics processes will become more complex, and the complexity and criticality of traceability will further increase. This research aims at developing a performance assessment method to verify scalability when implementing traceability systems based on key technologies for smart factories, such as Internet of Things (IoT) and BigData. To this end, based on existing research, we analyzed traceability requirements and an event schema for storing traceability data in MongoDB, a document-based Not Only SQL (NoSQL) database. Next, we analyzed the algorithm of the most representative traceability query and defined a query-level performance model, which is composed of response times for the components of the traceability query algorithm. Next, this performance model was solidified as a linear regression model because the response times increase linearly by a benchmark test. Finally, for a case analysis, we applied the performance model to a virtual automobile parts logistics. As a result of the case study, we verified the scalability of a MongoDB-based traceability system and predicted the point when data node servers should be expanded in this case. The traceability system performance assessment method proposed in this research can be used as a decision-making tool for hardware capacity planning during the initial stage of construction of traceability systems and during their operational phase. PMID:27983654
Performance Prediction of a MongoDB-Based Traceability System in Smart Factory Supply Chains.
Kang, Yong-Shin; Park, Il-Ha; Youm, Sekyoung
2016-12-14
In the future, with the advent of the smart factory era, manufacturing and logistics processes will become more complex, and the complexity and criticality of traceability will further increase. This research aims at developing a performance assessment method to verify scalability when implementing traceability systems based on key technologies for smart factories, such as Internet of Things (IoT) and BigData. To this end, based on existing research, we analyzed traceability requirements and an event schema for storing traceability data in MongoDB, a document-based Not Only SQL (NoSQL) database. Next, we analyzed the algorithm of the most representative traceability query and defined a query-level performance model, which is composed of response times for the components of the traceability query algorithm. Next, this performance model was solidified as a linear regression model because the response times increase linearly by a benchmark test. Finally, for a case analysis, we applied the performance model to a virtual automobile parts logistics. As a result of the case study, we verified the scalability of a MongoDB-based traceability system and predicted the point when data node servers should be expanded in this case. The traceability system performance assessment method proposed in this research can be used as a decision-making tool for hardware capacity planning during the initial stage of construction of traceability systems and during their operational phase.
2006-06-01
SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
SCHeMA web-based observation data information system
NASA Astrophysics Data System (ADS)
Novellino, Antonio; Benedetti, Giacomo; D'Angelo, Paolo; Confalonieri, Fabio; Massa, Francesco; Povero, Paolo; Tercier-Waeber, Marie-Louise
2016-04-01
It is well recognized that the need of sharing ocean data among non-specialized users is constantly increasing. Initiatives that are built upon international standards will contribute to simplify data processing and dissemination, improve user-accessibility also through web browsers, facilitate the sharing of information across the integrated network of ocean observing systems; and ultimately provide a better understanding of the ocean functioning. The SCHeMA (Integrated in Situ Chemical MApping probe) Project is developing an open and modular sensing solution for autonomous in situ high resolution mapping of a wide range of anthropogenic and natural chemical compounds coupled to master bio-physicochemical parameters (www.schema-ocean.eu). The SCHeMA web system is designed to ensure user-friendly data discovery, access and download as well as interoperability with other projects through a dedicated interface that implements the Global Earth Observation System of Systems - Common Infrastructure (GCI) recommendations and the international Open Geospatial Consortium - Sensor Web Enablement (OGC-SWE) standards. This approach will insure data accessibility in compliance with major European Directives and recommendations. Being modular, the system allows the plug-and-play of commercially available probes as well as new sensor probess under development within the project. The access to the network of monitoring probes is provided via a web-based system interface that, being implemented as a SOS (Sensor Observation Service), is providing standard interoperability and access tosensor observations systems through O&M standard - as well as sensor descriptions - encoded in Sensor Model Language (SensorML). The use of common vocabularies in all metadatabases and data formats, to describe data in an already harmonized and common standard is a prerequisite towards consistency and interoperability. Therefore, the SCHeMA SOS has adopted the SeaVox common vocabularies populated by SeaDataNet network of National Oceanographic Data Centres. The SCHeMA presentation layer, a fundamental part of the software architecture, offers to the user a bidirectional interaction with the integrated system allowing to manage and configure the sensor probes; view the stored observations and metadata, and handle alarms. The overall structure of the web portal developed within the SCHeMA initiative (Sensor Configuration, development of Core Profile interface for data access via OGC standard, external services such as web services, WMS, WFS; and Data download and query manager) will be presented and illustrated with examples of ongoing tests in costal and open sea.
Insertion algorithms for network model database management systems
NASA Astrophysics Data System (ADS)
Mamadolimov, Abdurashid; Khikmat, Saburov
2017-12-01
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, forms partial order. When a database is large and a query comparison is expensive then the efficiency requirement of managing algorithms is minimizing the number of query comparisons. We consider updating operation for network model database management systems. We develop a new sequantial algorithm for updating operation. Also we suggest a distributed version of the algorithm.
A proposal of fuzzy connective with learning function and its application to fuzzy retrieval system
NASA Technical Reports Server (NTRS)
Hayashi, Isao; Naito, Eiichi; Ozawa, Jun; Wakami, Noboru
1993-01-01
A new fuzzy connective and a structure of network constructed by fuzzy connectives are proposed to overcome a drawback of conventional fuzzy retrieval systems. This network represents a retrieval query and the fuzzy connectives in networks have a learning function to adjust its parameters by data from a database and outputs of a user. The fuzzy retrieval systems employing this network are also constructed. Users can retrieve results even with a query whose attributes do not exist in a database schema and can get satisfactory results for variety of thinkings by learning function.
Creating Access to Data of Worldwide Volcanic Unrest
NASA Astrophysics Data System (ADS)
Venezky, D. Y.; Newhall, C. G.; Malone, S. D.
2003-12-01
We are creating a pilot database (WOVOdat - the World Organization of Volcano Observatories database) using an open source database and content generation software, allowing web access to data of worldwide volcanic seismicity, ground deformation, fumarolic activity, and other changes within or adjacent to a volcanic system. After three years of discussions with volcano observatories of the WOVO community and institutional databases such as IRIS, UNAVCO, and the Smithsonian's Global Volcanism Program about how to link global data of volcanic unrest for use during crisis situations and for research, we are now developing the pilot database. We already have created the core tables and have written simple queries that access some of the available data using pull-down menus on a website. Over the next year, we plan to complete schema realization, expand querying capabilities, and then open the pilot database for a multi-year data-loading process. Many of the challenges we are encountering are common to multidisciplinary projects and include determining standard data formats, choosing levels of data detail (raw vs. minimally processed data, summary intervals vs. continuous data, etc.), and organizing the extant but variable data into a useable schema. Additionally, we are working on how best to enter the varied data into the database (scripts for digital data and web-entry tools for non-digital data) and what standard sets of queries are most important. An essential during an evolving volcanic crisis would be: `Has any volcano shown the behavior being observed here and what happened?'. We believe that with a systematic aggregation of all datasets on volcanic unrest, we should be able to find patterns that were previously inaccessible or unrecognized. The second WOVOdat workshop in 2002 provided a recent forum for discussion of data formats, database access, and schemas. The formats and units for the discussed parameters can be viewed at http://www.wovo.org/WOVOdat/parameters.htm. Comments, suggestions, and participation in all aspects of the WOVOdat project are welcome and appreciated.
Neurobiology of Schemas and Schema-Mediated Memory.
Gilboa, Asaf; Marlatte, Hannah
2017-08-01
Schemas are superordinate knowledge structures that reflect abstracted commonalities across multiple experiences, exerting powerful influences over how events are perceived, interpreted, and remembered. Activated schema templates modulate early perceptual processing, as they get populated with specific informational instances (schema instantiation). Instantiated schemas, in turn, can enhance or distort mnemonic processing from the outset (at encoding), impact offline memory transformation and accelerate neocortical integration. Recent studies demonstrate distinctive neurobiological processes underlying schema-related learning. Interactions between the ventromedial prefrontal cortex (vmPFC), hippocampus, angular gyrus (AG), and unimodal associative cortices support context-relevant schema instantiation and schema mnemonic effects. The vmPFC and hippocampus may compete (as suggested by some models) or synchronize (as suggested by others) to optimize schema-related learning depending on the specific operationalization of schema memory. This highlights the need for more precise definitions of memory schemas. Copyright © 2017 Elsevier Ltd. All rights reserved.
Enabling Incremental Query Re-Optimization.
Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau
2016-01-01
As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.
Enabling Incremental Query Re-Optimization
Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau
2017-01-01
As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658
Database architectures for Space Telescope Science Institute
NASA Astrophysics Data System (ADS)
Lubow, Stephen
1993-08-01
At STScI nearly all large applications require database support. A general purpose architecture has been developed and is in use that relies upon an extended client-server paradigm. Processing is in general distributed across three processes, each of which generally resides on its own processor. Database queries are evaluated on one such process, called the DBMS server. The DBMS server software is provided by a database vendor. The application issues database queries and is called the application client. This client uses a set of generic DBMS application programming calls through our STDB/NET programming interface. Intermediate between the application client and the DBMS server is the STDB/NET server. This server accepts generic query requests from the application and converts them into the specific requirements of the DBMS server. In addition, it accepts query results from the DBMS server and passes them back to the application. Typically the STDB/NET server is local to the DBMS server, while the application client may be remote. The STDB/NET server provides additional capabilities such as database deadlock restart and performance monitoring. This architecture is currently in use for some major STScI applications, including the ground support system. We are currently investigating means of providing ad hoc query support to users through the above architecture. Such support is critical for providing flexible user interface capabilities. The Universal Relation advocated by Ullman, Kernighan, and others appears to be promising. In this approach, the user sees the entire database as a single table, thereby freeing the user from needing to understand the detailed schema. A software layer provides the translation between the user and detailed schema views of the database. However, many subtle issues arise in making this transformation. We are currently exploring this scheme for use in the Hubble Space Telescope user interface to the data archive system (DADS).
XSemantic: An Extension of LCA Based XML Semantic Search
NASA Astrophysics Data System (ADS)
Supasitthimethee, Umaporn; Shimizu, Toshiyuki; Yoshikawa, Masatoshi; Porkaew, Kriengkrai
One of the most convenient ways to query XML data is a keyword search because it does not require any knowledge of XML structure or learning a new user interface. However, the keyword search is ambiguous. The users may use different terms to search for the same information. Furthermore, it is difficult for a system to decide which node is likely to be chosen as a return node and how much information should be included in the result. To address these challenges, we propose an XML semantic search based on keywords called XSemantic. On the one hand, we give three definitions to complete in terms of semantics. Firstly, the semantic term expansion, our system is robust from the ambiguous keywords by using the domain ontology. Secondly, to return semantic meaningful answers, we automatically infer the return information from the user queries and take advantage of the shortest path to return meaningful connections between keywords. Thirdly, we present the semantic ranking that reflects the degree of similarity as well as the semantic relationship so that the search results with the higher relevance are presented to the users first. On the other hand, in the LCA and the proximity search approaches, we investigated the problem of information included in the search results. Therefore, we introduce the notion of the Lowest Common Element Ancestor (LCEA) and define our simple rule without any requirement on the schema information such as the DTD or XML Schema. The first experiment indicated that XSemantic not only properly infers the return information but also generates compact meaningful results. Additionally, the benefits of our proposed semantics are demonstrated by the second experiment.
Component Prioritization Schema for Achieving Maximum Time and Cost Benefits from Software Testing
NASA Astrophysics Data System (ADS)
Srivastava, Praveen Ranjan; Pareek, Deepak
Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results. Defining the end of software testing represents crucial features of any software development project. A premature release will involve risks like undetected bugs, cost of fixing faults later, and discontented customers. Any software organization would want to achieve maximum possible benefits from software testing with minimum resources. Testing time and cost need to be optimized for achieving a competitive edge in the market. In this paper, we propose a schema, called the Component Prioritization Schema (CPS), to achieve an effective and uniform prioritization of the software components. This schema serves as an extension to the Non Homogenous Poisson Process based Cumulative Priority Model. We also introduce an approach for handling time-intensive versus cost-intensive projects.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
NASA Astrophysics Data System (ADS)
Farhan Husain, Mohammad; Doshi, Pankil; Khan, Latifur; Thuraisingham, Bhavani
Handling huge amount of data scalably is a matter of concern for a long time. Same is true for semantic web data. Current semantic web frameworks lack this ability. In this paper, we describe a framework that we built using Hadoop to store and retrieve large number of RDF triples. We describe our schema to store RDF data in Hadoop Distribute File System. We also present our algorithms to answer a SPARQL query. We make use of Hadoop's MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built mostly by cheap commodity class hardware and still can answer queries fast enough. We conclude that ours is a scalable framework, able to handle large amount of RDF data efficiently.
An RDF version of the VO Registry Version 1.00
NASA Astrophysics Data System (ADS)
Gray, Norman; Gray, Norman
2007-09-01
We describe the initial implementation of an RDF version of the IVOA Resource Registry, serving the registry data via a SPARQL query endpoint, including the creation of the ontology analogues of an important subset of the relevant XML Schemas, and the mechanics of the conversion process. The result is an experimental service, and this is an interim document.
Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.
2008-01-01
Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495
Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P
2008-10-01
This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/
Query Auto-Completion Based on Word2vec Semantic Similarity
NASA Astrophysics Data System (ADS)
Shao, Taihua; Chen, Honghui; Chen, Wanyu
2018-04-01
Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
KA-SB: from data integration to large scale reasoning
Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F
2009-01-01
Background The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. Methods KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). Results In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. Conclusion These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool , which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases. PMID:19796402
A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma.
Stripelis, Dimitris; Ambite, José Luis; Chiang, Yao-Yi; Eckel, Sandrah P; Habre, Rima
2017-04-01
According to the Centers for Disease Control, in the United States there are 6.8 million children living with asthma. Despite the importance of the disease, the available prognostic tools are not sufficient for biomedical researchers to thoroughly investigate the potential risks of the disease at scale. To overcome these challenges we present a big data integration and analysis infrastructure developed by our Data and Software Coordination and Integration Center (DSCIC) of the NIBIB-funded Pediatric Research using Integrated Sensor Monitoring Systems (PRISMS) program. Our goal is to help biomedical researchers to efficiently predict and prevent asthma attacks. The PRISMS-DSCIC is responsible for collecting, integrating, storing, and analyzing real-time environmental, physiological and behavioral data obtained from heterogeneous sensor and traditional data sources. Our architecture is based on the Apache Kafka, Spark and Hadoop frameworks and PostgreSQL DBMS. A main contribution of this work is extending the Spark framework with a mediation layer, based on logical schema mappings and query rewriting, to facilitate data analysis over a consistent harmonized schema. The system provides both batch and stream analytic capabilities over the massive data generated by wearable and fixed sensors.
New concepts for building vocabulary for cell image ontologies.
Plant, Anne L; Elliott, John T; Bhat, Talapady N
2011-12-21
There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments. Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations. Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web.
New concepts for building vocabulary for cell image ontologies
2011-01-01
Background There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments. Results Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations. Conclusions Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web. PMID:22188658
EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-01-16
The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution,more » diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'« less
An effective XML based name mapping mechanism within StoRM
NASA Astrophysics Data System (ADS)
Corso, E.; Forti, A.; Ghiselli, A.; Magnoni, L.; Zappi, R.
2008-07-01
In a Grid environment the naming capability allows users to refer to specific data resources in a physical storage system using a high level logical identifier. This logical identifier is typically organized in a file system like structure, a hierarchical tree of names. Storage Resource Manager (SRM) services map the logical identifier to the physical location of data evaluating a set of parameters as the desired quality of services and the VOMS attributes specified in the requests. StoRM is a SRM service developed by INFN and ICTP-EGRID to manage file and space on standard POSIX and high performing parallel and cluster file systems. An upcoming requirement in the Grid data scenario is the orthogonality of the logical name and the physical location of data, in order to refer, with the same identifier, to different copies of data archived in various storage areas with different quality of service. The mapping mechanism proposed in StoRM is based on a XML document that represents the different storage components managed by the service, the storage areas defined by the site administrator, the quality of service they provide and the Virtual Organization that want to use the storage area. An appropriate directory tree is realized in each storage component reflecting the XML schema. In this scenario StoRM is able to identify the physical location of a requested data evaluating the logical identifier and the specified attributes following the XML schema, without querying any database service. This paper presents the namespace schema defined, the different entities represented and the technical details of the StoRM implementation.
Federated Web-accessible Clinical Data Management within an Extensible NeuroImaging Database
Keator, David B.; Wei, Dingying; Fennema-Notestine, Christine; Pease, Karen R.; Bockholt, Jeremy; Grethe, Jeffrey S.
2010-01-01
Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site. PMID:20567938
Federated web-accessible clinical data management within an extensible neuroimaging database.
Ozyurt, I Burak; Keator, David B; Wei, Dingying; Fennema-Notestine, Christine; Pease, Karen R; Bockholt, Jeremy; Grethe, Jeffrey S
2010-12-01
Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site.
NASA Astrophysics Data System (ADS)
Jones, A. S.; Horsburgh, J. S.; Matos, M.; Caraballo, J.
2015-12-01
Networks conducting long term monitoring using in situ sensors need the functionality to track physical equipment as well as deployments, calibrations, and other actions related to site and equipment maintenance. The observational data being generated by sensors are enhanced if direct linkages to equipment details and actions can be made. This type of information is typically recorded in field notebooks or in static files, which are rarely linked to observations in a way that could be used to interpret results. However, the record of field activities is often relevant to analysis or post-processing of the observational data. We have developed an underlying database schema and deployed a web interface for recording and retrieving information on physical infrastructure and related actions for observational networks. The database schema for equipment was designed as an extension to the Observations Data Model 2 (ODM2), a community-developed information model for spatially discrete, feature based earth observations. The core entities of ODM2 describe location, observed variable, and timing of observations, and the equipment extension contains entities to provide additional metadata specific to the inventory of physical infrastructure and associated actions. The schema is implemented in a relational database system for storage and management with an associated web interface. We designed the web-based tools for technicians to enter and query information on the physical equipment and actions such as site visits, equipment deployments, maintenance, and calibrations. These tools were implemented for the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) ecohydrologic observatory, and we anticipate that they will be useful for similar large-scale monitoring networks desiring to link observing infrastructure to observational data to increase the quality of sensor-based data products.
Agent-Based Framework for Discrete Entity Simulations
2006-11-01
Postgres database server for environment queries of neighbors and continuum data. As expected for raw database queries (no database optimizations in...form. Eventually the code was ported to GNU C++ on the same single Intel Pentium 4 CPU running RedHat Linux 9.0 and Postgres database server...Again Postgres was used for environmental queries, and the tool remained relatively slow because of the immense number of queries necessary to assess
Hierarchical content-based image retrieval by dynamic indexing and guided search
NASA Astrophysics Data System (ADS)
You, Jane; Cheung, King H.; Liu, James; Guo, Linong
2003-12-01
This paper presents a new approach to content-based image retrieval by using dynamic indexing and guided search in a hierarchical structure, and extending data mining and data warehousing techniques. The proposed algorithms include: a wavelet-based scheme for multiple image feature extraction, the extension of a conventional data warehouse and an image database to an image data warehouse for dynamic image indexing, an image data schema for hierarchical image representation and dynamic image indexing, a statistically based feature selection scheme to achieve flexible similarity measures, and a feature component code to facilitate query processing and guide the search for the best matching. A series of case studies are reported, which include a wavelet-based image color hierarchy, classification of satellite images, tropical cyclone pattern recognition, and personal identification using multi-level palmprint and face features.
A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805
ERIC Educational Resources Information Center
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
2011-01-01
Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…
An adaptable XML based approach for scientific data management and integration
NASA Astrophysics Data System (ADS)
Wang, Fusheng; Thiel, Florian; Furrer, Daniel; Vergara-Niedermayr, Cristobal; Qin, Chen; Hackenberg, Georg; Bourgue, Pierre-Emmanuel; Kaltschmidt, David; Wang, Mo
2008-03-01
Increased complexity of scientific research poses new challenges to scientific data management. Meanwhile, scientific collaboration is becoming increasing important, which relies on integrating and sharing data from distributed institutions. We develop SciPort, a Web-based platform on supporting scientific data management and integration based on a central server based distributed architecture, where researchers can easily collect, publish, and share their complex scientific data across multi-institutions. SciPort provides an XML based general approach to model complex scientific data by representing them as XML documents. The documents capture not only hierarchical structured data, but also images and raw data through references. In addition, SciPort provides an XML based hierarchical organization of the overall data space to make it convenient for quick browsing. To provide generalization, schemas and hierarchies are customizable with XML-based definitions, thus it is possible to quickly adapt the system to different applications. While each institution can manage documents on a Local SciPort Server independently, selected documents can be published to a Central Server to form a global view of shared data across all sites. By storing documents in a native XML database, SciPort provides high schema extensibility and supports comprehensive queries through XQuery. By providing a unified and effective means for data modeling, data access and customization with XML, SciPort provides a flexible and powerful platform for sharing scientific data for scientific research communities, and has been successfully used in both biomedical research and clinical trials.
An Adaptable XML Based Approach for Scientific Data Management and Integration.
Wang, Fusheng; Thiel, Florian; Furrer, Daniel; Vergara-Niedermayr, Cristobal; Qin, Chen; Hackenberg, Georg; Bourgue, Pierre-Emmanuel; Kaltschmidt, David; Wang, Mo
2008-02-20
Increased complexity of scientific research poses new challenges to scientific data management. Meanwhile, scientific collaboration is becoming increasing important, which relies on integrating and sharing data from distributed institutions. We develop SciPort, a Web-based platform on supporting scientific data management and integration based on a central server based distributed architecture, where researchers can easily collect, publish, and share their complex scientific data across multi-institutions. SciPort provides an XML based general approach to model complex scientific data by representing them as XML documents. The documents capture not only hierarchical structured data, but also images and raw data through references. In addition, SciPort provides an XML based hierarchical organization of the overall data space to make it convenient for quick browsing. To provide generalization, schemas and hierarchies are customizable with XML-based definitions, thus it is possible to quickly adapt the system to different applications. While each institution can manage documents on a Local SciPort Server independently, selected documents can be published to a Central Server to form a global view of shared data across all sites. By storing documents in a native XML database, SciPort provides high schema extensibility and supports comprehensive queries through XQuery. By providing a unified and effective means for data modeling, data access and customization with XML, SciPort provides a flexible and powerful platform for sharing scientific data for scientific research communities, and has been successfully used in both biomedical research and clinical trials.
Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-07-04
As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-01-01
Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323
Gojani, Parvin Jamali; Masjedi, Mohsen; Khaleghipour, Shahnaz; Behzadi, Ehsan
2017-01-01
Background: This study aimed to compare the effects of the schema along with mindfulness-based therapies in the psoriasis patients. Materials and Methods: This semi-experimental study with post- and pre-tests was conducted on the psoriasis patients in the Dermatology Clinic of the Isfahan Alzahra Hospital, Iran using the convenience sampling in 2014. The patients had a low general health score. The experimental groups included two treatment groups of schema-based (n = 8) and mindfulness (n = 8). Both groups received eight 90-min sessions therapy once a week; they were compared with 8 patients in the control group. To evaluate the psoriasis patients’ maladaptive schema, Young schema questionnaire was used. Data were analyzed through the covariance analysis test. Results: There was a significant difference between the schema-based therapy and mindfulness groups with the control group. There was also a significant difference between the schema-based therapy groups consisting of the defeated schema, dependence/incompetence schema, devotion schema, stubbornly criteria schema, merit schema, restraint/inadequate self-discipline schema, and the control group. Moreover, a significant difference existed between the maladaptive schema of mindfulness therapy group and the controls. There was a significant difference concerning the improvement of the psychopathologic symptoms between the mindfulness therapy group and the control group. Conclusions: This study showed similar effects of both the schema and mindfulness-based therapies on the maladaptive schemas in improving the psoriasis patients with the psychopathologic symptoms. PMID:28217649
BioWarehouse: a bioinformatics database warehouse toolkit
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
2006-01-01
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
BioWarehouse: a bioinformatics database warehouse toolkit.
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D
2006-03-23
This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel
2013-04-15
In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.
2013-01-01
Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394
Evolution of Query Optimization Methods
NASA Astrophysics Data System (ADS)
Hameurlain, Abdelkader; Morvan, Franck
Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, M; Robertson, S; Moore, J
Purpose: Advancement in Radiation Oncology (RO) practice develops through evidence based medicine and clinical trial. Knowledge usable for treatment planning, decision support and research is contained in our clinical data, stored in an Oncospace database. This data store and the tools for populating and analyzing it are compatible with standard RO practice and are shared with collaborating institutions. The question is - what protocol for system development and data sharing within an Oncospace Consortium? We focus our example on the technology and data meaning necessary to share across the Consortium. Methods: Oncospace consists of a database schema, planning and outcomemore » data import and web based analysis tools.1) Database: The Consortium implements a federated data store; each member collects and maintains its own data within an Oncospace schema. For privacy, PHI is contained within a single table, accessible to the database owner.2) Import: Spatial dose data from treatment plans (Pinnacle or DICOM) is imported via Oncolink. Treatment outcomes are imported from an OIS (MOSAIQ).3) Analysis: JHU has built a number of webpages to answer analysis questions. Oncospace data can also be analyzed via MATLAB or SAS queries.These materials are available to Consortium members, who contribute enhancements and improvements. Results: 1) The Oncospace Consortium now consists of RO centers at JHU, UVA, UW and the University of Toronto. These members have successfully installed and populated Oncospace databases with over 1000 patients collectively.2) Members contributing code and getting updates via SVN repository. Errors are reported and tracked via Redmine. Teleconferences include strategizing design and code reviews.3) Successfully remotely queried federated databases to combine multiple institutions’ DVH data for dose-toxicity analysis (see below – data combined from JHU and UW Oncospace). Conclusion: RO data sharing can and has been effected according to the Oncospace Consortium model: http://oncospace.radonc.jhmi.edu/ . John Wong - SRA from Elekta; Todd McNutt - SRA from Elekta; Michael Bowers - funded by Elekta.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, K; Phillips, M; Fishburn, M
Purpose: To implement a common database structure and user-friendly web-browser based data collection tools across several medical institutions to better support evidence-based clinical decision making and comparative effectiveness research through shared outcomes data. Methods: A consortium of four academic medical centers agreed to implement a federated database, known as Oncospace. Initial implementation has addressed issues of differences between institutions in workflow and types and breadth of structured information captured. This requires coordination of data collection from departmental oncology information systems (OIS), treatment planning systems, and hospital electronic medical records in order to include as much as possible the multi-disciplinary clinicalmore » data associated with a patients care. Results: The original database schema was well-designed and required only minor changes to meet institution-specific data requirements. Mobile browser interfaces for data entry and review for both the OIS and the Oncospace database were tailored for the workflow of individual institutions. Federation of database queries--the ultimate goal of the project--was tested using artificial patient data. The tests serve as proof-of-principle that the system as a whole--from data collection and entry to providing responses to research queries of the federated database--was viable. The resolution of inter-institutional use of patient data for research is still not completed. Conclusions: The migration from unstructured data mainly in the form of notes and documents to searchable, structured data is difficult. Making the transition requires cooperation of many groups within the department and can be greatly facilitated by using the structured data to improve clinical processes and workflow. The original database schema design is critical to providing enough flexibility for multi-institutional use to improve each institution s ability to study outcomes, determine best practices, and support research. The project has demonstrated the feasibility of deploying a federated database environment for research purposes to multiple institutions.« less
Multidimensional indexing structure for use with linear optimization queries
NASA Technical Reports Server (NTRS)
Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)
2002-01-01
Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.
The Role of Ontologies in Schema-based Program Synthesis
NASA Technical Reports Server (NTRS)
Bures, Tomas; Denney, Ewen; Fischer, Bernd; Nistor, Eugen C.
2004-01-01
Program synthesis is the process of automatically deriving executable code from (non-executable) high-level specifications. It is more flexible and powerful than conventional code generation techniques that simply translate algorithmic specifications into lower-level code or only create code skeletons from structural specifications (such as UML class diagrams). Key to building a successful synthesis system is specializing to an appropriate application domain. The AUTOBAYES and AUTOFILTER systems, under development at NASA Ames, operate in the two domains of data analysis and state estimation, respectively. The central concept of both systems is the schema, a representation of reusable computational knowledge. This can take various forms, including high-level algorithm templates, code optimizations, datatype refinements, or architectural information. A schema also contains applicability conditions that are used to determine when it can be applied safely. These conditions can refer to the initial specification, to intermediate results, or to elements of the partially-instantiated code. Schema-based synthesis uses AI technology to recursively apply schemas to gradually refine a specification into executable code. This process proceeds in two main phases. A front-end gradually transforms the problem specification into a program represented in an abstract intermediate code. A backend then compiles this further down into a concrete target programming language of choice. A core engine applies schemas on the initial problem specification, then uses the output of those schemas as the input for other schemas, until the full implementation is generated. Since there might be different schemas that implement different solutions to the same problem this process can generate an entire solution tree. AUTOBAYES and AUTOFILTER have reached the level of maturity where they enable users to solve interesting application problems, e.g., the analysis of Hubble Space Telescope images. They are large (in total around 100kLoC Prolog), knowledge intensive systems that employ complex symbolic reasoning to generate a wide range of non-trivial programs for complex application do- mains. Their schemas can have complex interactions, which make it hard to change them in isolation or even understand what an existing schema actually does. Adding more capabilities by increasing the number of schemas will only worsen this situation, ultimately leading to the entropy death of the synthesis system. The root came of this problem is that the domain knowledge is scattered throughout the entire system and only represented implicitly in the schema implementations. In our current work, we are addressing this problem by making explicit the knowledge from Merent parts of the synthesis system. Here; we discuss how Gruber's definition of an ontology as an explicit specification of a conceptualization matches our efforts in identifying and explicating the domain-specific concepts. We outline the dual role ontologies play in schema-based synthesis and argue that they address different audiences and serve different purposes. Their first role is descriptive: they serve as explicit documentation, and help to understand the internal structure of the system. Their second role is prescriptive: they provide the formal basis against which the other parts of the system (e.g., schemas) can be checked. Their final role is referential: ontologies also provide semantically meaningful "hooks" which allow schemas and tools to access the internal state of the program derivation process (e.g., fragments of the generated code) in domain-specific rather than language-specific terms, and thus to modify it in a controlled fashion. For discussion purposes we use AUTOLINEAR, a small synthesis system we are currently experimenting with, which can generate code for solving a system of linear equations, Az = b.
NASA Astrophysics Data System (ADS)
Zheng, Yan
2015-03-01
Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
Optimizing a Query by Transformation and Expansion.
Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank
2017-01-01
In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.
Tagare, Hemant D.; Jaffe, C. Carl; Duncan, James
1997-01-01
Abstract Information contained in medical images differs considerably from that residing in alphanumeric format. The difference can be attributed to four characteristics: (1) the semantics of medical knowledge extractable from images is imprecise; (2) image information contains form and spatial data, which are not expressible in conventional language; (3) a large part of image information is geometric; (4) diagnostic inferences derived from images rest on an incomplete, continuously evolving model of normality. This paper explores the differentiating characteristics of text versus images and their impact on design of a medical image database intended to allow content-based indexing and retrieval. One strategy for implementing medical image databases is presented, which employs object-oriented iconic queries, semantics by association with prototypes, and a generic schema. PMID:9147338
Use of standard vocabulary services in validation of water resources data
NASA Astrophysics Data System (ADS)
Yu, Jonathan; Cox, Simon; Ratcliffe, David
2010-05-01
Ontology repositories are increasingly being exposed through vocabulary and concept services. Primarily this is in support of resource discovery. Thesaurus functionality and even more sophisticated reasoning offers the possibility of overcoming the limitations of simple text-matching and tagging which is the basis of most search. However, controlled vocabularies have other important roles in distributed systems: in particular in constraining content validity. A national water information system established by the Australian Bureau of Meterorology ('the Bureau') has deployed a system for ingestion of data from multiple providers. This uses a http interface onto separately maintained vocabulary services as part of the quality assurance chain. With over 200 data providers potentially transferring data to the Bureau, a standard XML-based Water Data Transfer Format (WDTF) was developed for receipt of data into an integrated national water information system. The WDTF schema was built upon standards from the Open Geospatial Consortium (OGC). The structure and syntax specified by a W3C XML Schema is complemented by additional constraints described using Schematron. These implement important content requirements and business rules including: • Restricted cardinality: where optional elements and attributes inherited from the base standards become mandatory in the application, or repeatable elements or attributes are limited to one or omitted. For example, the sampledFeature element from O&M is optional but is mandatory for a samplingPoint element in WDTF. • Vocabulary checking: WDTF data use seventeen vocabularies or code lists derived from Regulations under the Commonwealth Water Act 2007. Examples of codelists are the Australian Water Regulations list, observed property vocabulary, and units of measures. • Contextual constraints: in many places, the permissible value is dependent on the value of another field. For example, within observations the unit of measure must be commensurate with the observed property type Validation of data submitted in WDTF uses a two-pass approach. First, syntax and structural validation is performed by standard XML Schema validation tools. Second, validation of contextual constraints and code list checking is performed using a hybrid method combining context-sensitive rule-based validation (allowing the rules to be expressed within a given context) and semantic vocabulary services. Schematron allows rules to incorporate assertions of XPath expressions to access and constrain element content, therefore enabling contextual constraints. Schematron is also used to perform element cardinality checking. The vocabularies or code lists are formalized in SKOS (Simple Knowledge Organization System), an RDF-based language. SKOS provides mechanisms to define concepts, associate them with (multi-lingual) labels or terms, and record thesaurus-like relationships between them. The vocabularies are managed in a RDF database or semantic triple store. Querying is implemented as a semantic vocabulary service, with an http-based API that allows queries to be issued from rules written in Schematron. WDTF has required development and deployment of some ontologies whose scope is much more general than this application, in particular covering 'observed properties' and 'units of measure', which also have to be related to each other and consistent with the dimensional analysis. Separation of the two validation passes reflects the separate governance and stability of the structural and content rules, and allows an organisation's business rules to be moved out of the XML schema definition and the XML schema to be reused by other businesses with their own specific rules. With the general approach proven, harmonization opportunities with more generic services are being explored, such as the GEMET API for SKOS, developed by the European Environment Agency. Acknowledgements: The authors would like to thank the AUSCOPE team for their development and support provided of the vocabulary services.
A novel adaptive Cuckoo search for optimal query plan generation.
Gomathi, Ramalingam; Sharmila, Dhandapani
2014-01-01
The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Standards opportunities around data-bearing Web pages.
Karger, David
2013-03-28
The evolving Web has seen ever-growing use of structured data, thanks to the way it enhances information authoring, querying, visualization and sharing. To date, however, most structured data authoring and management tools have been oriented towards programmers and Web developers. End users have been left behind, unable to leverage structured data for information management and communication as well as professionals. In this paper, I will argue that many of the benefits of structured data management can be provided to end users as well. I will describe an approach and tools that allow end users to define their own schemas (without knowing what a schema is), manage data and author (not program) interactive Web visualizations of that data using the Web tools with which they are already familiar, such as plain Web pages, blogs, wikis and WYSIWYG document editors. I will describe our experience deploying these tools and some lessons relevant to their future evolution.
González-Ferrer, A; Peleg, M; Marcos, M; Maldonado, J A
2016-07-01
Delivering patient-specific decision-support based on computer-interpretable guidelines (CIGs) requires mapping CIG clinical statements (data items, clinical recommendations) into patients' data. This is most effectively done via intermediate data schemas, which enable querying the data according to the semantics of a shared standard intermediate schema. This study aims to evaluate the use of HL7 virtual medical record (vMR) and openEHR archetypes as intermediate schemas for capturing clinical statements from CIGs that are mappable to electronic health records (EHRs) containing patient data and patient-specific recommendations. Using qualitative research methods, we analyzed the encoding of ten representative clinical statements taken from two CIGs used in real decision-support systems into two health information models (openEHR archetypes and HL7 vMR instances) by four experienced informaticians. Discussion among the modelers about each case study example greatly increased our understanding of the capabilities of these standards, which we share in this educational paper. Differing in content and structure, the openEHR archetypes were found to contain a greater level of representational detail and structure while the vMR representations took fewer steps to complete. The use of openEHR in the encoding of CIG clinical statements could potentially facilitate applications other than decision-support, including intelligent data analysis and integration of additional properties of data items from existing EHRs. On the other hand, due to their smaller size and fewer details, the use of vMR potentially supports quicker mapping of EHR data into clinical statements.
Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang
2017-01-01
To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.
Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang
2017-01-01
To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239
Developing A Web-based User Interface for Semantic Information Retrieval
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Keller, Richard M.
2003-01-01
While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
Securely and Flexibly Sharing a Biomedical Data Management System
Wang, Fusheng; Hussels, Phillip; Liu, Peiya
2011-01-01
Biomedical database systems need not only to address the issues of managing complex data, but also to provide data security and access control to the system. These include not only system level security, but also instance level access control such as access of documents, schemas, or aggregation of information. The latter is becoming more important as multiple users can share a single scientific data management system to conduct their research, while data have to be protected before they are published or IP-protected. This problem is challenging as users’ needs for data security vary dramatically from one application to another, in terms of who to share with, what resources to be shared, and at what access level. We develop a comprehensive data access framework for a biomedical data management system SciPort. SciPort provides fine-grained multi-level space based access control of resources at not only object level (documents and schemas), but also space level (resources set aggregated in a hierarchy way). Furthermore, to simplify the management of users and privileges, customizable role-based user model is developed. The access control is implemented efficiently by integrating access privileges into the backend XML database, thus efficient queries are supported. The secure access approach we take makes it possible for multiple users to share the same biomedical data management system with flexible access management and high data security. PMID:21625285
Web page sorting algorithm based on query keyword distance relation
NASA Astrophysics Data System (ADS)
Yang, Han; Cui, Hong Gang; Tang, Hao
2017-08-01
In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.
Optimizing Interactive Development of Data-Intensive Applications
Interlandi, Matteo; Tetali, Sai Deep; Gulzar, Muhammad Ali; Noor, Joseph; Condie, Tyson; Kim, Miryung; Millstein, Todd
2017-01-01
Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications. PMID:28405637
Query optimization for graph analytics on linked data using SPARQL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan
2015-07-01
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
Optimizing Maintenance of Constraint-Based Database Caches
NASA Astrophysics Data System (ADS)
Klein, Joachim; Braun, Susanne
Caching data reduces user-perceived latency and often enhances availability in case of server crashes or network failures. DB caching aims at local processing of declarative queries in a DBMS-managed cache close to the application. Query evaluation must produce the same results as if done at the remote database backend, which implies that all data records needed to process such a query must be present and controlled by the cache, i. e., to achieve “predicate-specific” loading and unloading of such record sets. Hence, cache maintenance must be based on cache constraints such that “predicate completeness” of the caching units currently present can be guaranteed at any point in time. We explore how cache groups can be maintained to provide the data currently needed. Moreover, we design and optimize loading and unloading algorithms for sets of records keeping the caching units complete, before we empirically identify the costs involved in cache maintenance.
Solving Word Problems using Schemas: A Review of the Literature
Powell, Sarah R.
2011-01-01
Solving word problems is a difficult task for students at-risk for or with learning disabilities (LD). One instructional approach that has emerged as a valid method for helping students at-risk for or with LD to become more proficient at word-problem solving is using schemas. A schema is a framework for solving a problem. With a schema, students are taught to recognize problems as falling within word-problem types and to apply a problem solution method that matches that problem type. This review highlights two schema approaches for 2nd- and 3rd-grade students at-risk for or with LD: schema-based instruction and schema-broadening instruction. A total of 12 schema studies were reviewed and synthesized. Both types of schema approaches enhanced the word-problem skill of students at-risk for or with LD. Based on the review, suggestions are provided for incorporating word-problem instruction using schemas. PMID:21643477
SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts
NASA Astrophysics Data System (ADS)
Howe, B.; Halperin, D.
2014-12-01
Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets. In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.
A new relational database structure and online interface for the HITRAN database
NASA Astrophysics Data System (ADS)
Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan
2013-11-01
A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.
Entity Bases: Large-Scale Knowledgebases for Intelligence Data
2009-02-01
declaratively expressed as Datalog rules . The EntityBase supports two query scenarios: • Free-Form Querying: A human analyst or a client program can pose...integration, Prometheus follows the Inverse Rules algo- rithm (Duschka 1997) with additional optimizations (Thakkar et al. 2005). We use the mediator...Discovery and Data Mining (PAKDD), Sydney, Australia. Crammer , K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. (2006). Online passive
Evaluation methodology for query-based scene understanding systems
NASA Astrophysics Data System (ADS)
Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.
2015-05-01
In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.
An online analytical processing multi-dimensional data warehouse for malaria data
Madey, Gregory R; Vyushkov, Alexander; Raybaud, Benoit; Burkot, Thomas R; Collins, Frank H
2017-01-01
Abstract Malaria is a vector-borne disease that contributes substantially to the global burden of morbidity and mortality. The management of malaria-related data from heterogeneous, autonomous, and distributed data sources poses unique challenges and requirements. Although online data storage systems exist that address specific malaria-related issues, a globally integrated online resource to address different aspects of the disease does not exist. In this article, we describe the design, implementation, and applications of a multi-dimensional, online analytical processing data warehouse, named the VecNet Data Warehouse (VecNet-DW). It is the first online, globally-integrated platform that provides efficient search, retrieval and visualization of historical, predictive, and static malaria-related data, organized in data marts. Historical and static data are modelled using star schemas, while predictive data are modelled using a snowflake schema. The major goals, characteristics, and components of the DW are described along with its data taxonomy and ontology, the external data storage systems and the logical modelling and physical design phases. Results are presented as screenshots of a Dimensional Data browser, a Lookup Tables browser, and a Results Viewer interface. The power of the DW emerges from integrated querying of the different data marts and structuring those queries to the desired dimensions, enabling users to search, view, analyse, and store large volumes of aggregated data, and responding better to the increasing demands of users. Database URL https://dw.vecnet.org/datawarehouse/ PMID:29220463
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
NASA Astrophysics Data System (ADS)
Velazquez, Enrique Israel
Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients' clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.
Rapid Deployment of a RESTful Service for Oceanographic Research Cruises
NASA Astrophysics Data System (ADS)
Fu, Linyun; Arko, Robert; Leadbetter, Adam
2014-05-01
The Ocean Data Interoperability Platform (ODIP) seeks to increase data sharing across scientific domains and international boundaries, by providing a forum to harmonize diverse regional data systems. ODIP participants from the US include the Rolling Deck to Repository (R2R) program, whose mission is to capture, catalog, and describe the underway/environmental sensor data from US oceanographic research vessels and submit the data to public long-term archives. R2R publishes information online as Linked Open Data, making it widely available using Semantic Web standards. Each vessel, sensor, cruise, dataset, person, organization, funding award, log, report, etc, has a Uniform Resource Identifier (URI). Complex queries that federate results from other data providers are supported, using the SPARQL query language. To facilitate interoperability, R2R uses controlled vocabularies developed collaboratively by the science community (eg. SeaDataNet device categories) and published online by the NERC Vocabulary Server (NVS). In response to user feedback, we are developing a standard programming interface (API) and Web portal for R2R's Linked Open Data. The API provides a set of simple REST-type URLs that are translated on-the-fly into SPARQL queries, and supports common output formats (eg. JSON). We will demonstrate an implementation based on the Epimorphics Linked Data API (ELDA) open-source Java package. Our experience shows that constructing a simple portal with limited schema elements in this way can significantly reduce development time and maintenance complexity.
Three-dimensional motor schema based navigation
NASA Technical Reports Server (NTRS)
Arkin, Ronald C.
1989-01-01
Reactive schema-based navigation is possible in space domains by extending the methods developed for ground-based navigation found within the Autonomous Robot Architecture (AuRA). Reformulation of two dimensional motor schemas for three dimensional applications is a straightforward process. The manifold advantages of schema-based control persist, including modular development, amenability to distributed processing, and responsiveness to environmental sensing. Simulation results show the feasibility of this methodology for space docking operations in a cluttered work area.
Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images
NASA Technical Reports Server (NTRS)
Fischer, Bernd
2004-01-01
Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems which use numerical approximations even in cases where closed-form solutions exist. AutoBayes is implemented in Prolog and comprises approximately 75.000 lines of code. In this paper, we take one typical scientific data analysis problem-analyzing planetary nebulae images taken by the Hubble Space Telescope-and show how AutoBayes can be used to automate the implementation of the necessary anal- ysis programs. We initially follow the analysis described by Knuth and Hajian [KHO2] and use AutoBayes to derive code for the published models. We show the details of the code derivation process, including the symbolic computations and automatic integration of library procedures, and compare the results of the automatically generated and manually implemented code. We then go beyond the original analysis and use AutoBayes to derive code for a simple image segmentation procedure based on a mixture model which can be used to automate a manual preproceesing step. Finally, we combine the original approach with the simple segmentation which yields a more detailed analysis. This also demonstrates that AutoBayes makes it easy to combine different aspects of data analysis.
Use of Schema on Read in Earth Science Data Archives
NASA Astrophysics Data System (ADS)
Petrenko, M.; Hegde, M.; Smit, C.; Pilone, P.; Pham, L.
2017-12-01
Traditionally, NASA Earth Science data archives have file-based storage using proprietary data file formats, such as HDF and HDF-EOS, which are optimized to support fast and efficient storage of spaceborne and model data as they are generated. The use of file-based storage essentially imposes an indexing strategy based on data dimensions. In most cases, NASA Earth Science data uses time as the primary index, leading to poor performance in accessing data in spatial dimensions. For example, producing a time series for a single spatial grid cell involves accessing a large number of data files. With exponential growth in data volume due to the ever-increasing spatial and temporal resolution of the data, using file-based archives poses significant performance and cost barriers to data discovery and access. Storing and disseminating data in proprietary data formats imposes an additional access barrier for users outside the mainstream research community. At the NASA Goddard Earth Sciences Data Information Services Center (GES DISC), we have evaluated applying the "schema-on-read" principle to data access and distribution. We used Apache Parquet to store geospatial data, and have exposed data through Amazon Web Services (AWS) Athena, AWS Simple Storage Service (S3), and Apache Spark. Using the "schema-on-read" approach allows customization of indexing—spatial or temporal—to suit the data access pattern. The storage of data in open formats such as Apache Parquet has widespread support in popular programming languages. A wide range of solutions for handling big data lowers the access barrier for all users. This presentation will discuss formats used for data storage, frameworks with support for "schema-on-read" used for data access, and common use cases covering data usage patterns seen in a geospatial data archive.
Use of Schema on Read in Earth Science Data Archives
NASA Technical Reports Server (NTRS)
Hegde, Mahabaleshwara; Smit, Christine; Pilone, Paul; Petrenko, Maksym; Pham, Long
2017-01-01
Traditionally, NASA Earth Science data archives have file-based storage using proprietary data file formats, such as HDF and HDF-EOS, which are optimized to support fast and efficient storage of spaceborne and model data as they are generated. The use of file-based storage essentially imposes an indexing strategy based on data dimensions. In most cases, NASA Earth Science data uses time as the primary index, leading to poor performance in accessing data in spatial dimensions. For example, producing a time series for a single spatial grid cell involves accessing a large number of data files. With exponential growth in data volume due to the ever-increasing spatial and temporal resolution of the data, using file-based archives poses significant performance and cost barriers to data discovery and access. Storing and disseminating data in proprietary data formats imposes an additional access barrier for users outside the mainstream research community. At the NASA Goddard Earth Sciences Data Information Services Center (GES DISC), we have evaluated applying the schema-on-read principle to data access and distribution. We used Apache Parquet to store geospatial data, and have exposed data through Amazon Web Services (AWS) Athena, AWS Simple Storage Service (S3), and Apache Spark. Using the schema-on-read approach allows customization of indexing spatially or temporally to suit the data access pattern. The storage of data in open formats such as Apache Parquet has widespread support in popular programming languages. A wide range of solutions for handling big data lowers the access barrier for all users. This presentation will discuss formats used for data storage, frameworks with This presentation will discuss formats used for data storage, frameworks with support for schema-on-read used for data access, and common use cases covering data usage patterns seen in a geospatial data archive.
An incremental database access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, Nicholas; Sellis, Timos
1994-01-01
We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.
Optimization of the Controlled Evaluation of Closed Relational Queries
NASA Astrophysics Data System (ADS)
Biskup, Joachim; Lochner, Jan-Hendrik; Sonntag, Sebastian
For relational databases, controlled query evaluation is an effective inference control mechanism preserving confidentiality regarding a previously declared confidentiality policy. Implementations of controlled query evaluation usually lack efficiency due to costly theorem prover calls. Suitably constrained controlled query evaluation can be implemented efficiently, but is not flexible enough from the perspective of database users and security administrators. In this paper, we propose an optimized framework for controlled query evaluation in relational databases, being efficiently implementable on the one hand and relaxing the constraints of previous approaches on the other hand.
Spatial Query for Planetary Data
NASA Technical Reports Server (NTRS)
Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.
2011-01-01
Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.
An approach for management of geometry data
NASA Technical Reports Server (NTRS)
Dube, R. P.; Herron, G. J.; Schweitzer, J. E.; Warkentine, E. R.
1980-01-01
The strategies for managing Integrated Programs for Aerospace Design (IPAD) computer-based geometry are described. The computer model of geometry is the basis for communication, manipulation, and analysis of shape information. IPAD's data base system makes this information available to all authorized departments in a company. A discussion of the data structures and algorithms required to support geometry in IPIP (IPAD's data base management system) is presented. Through the use of IPIP's data definition language, the structure of the geometry components is defined. The data manipulation language is the vehicle by which a user defines an instance of the geometry. The manipulation language also allows a user to edit, query, and manage the geometry. The selection of canonical forms is a very important part of the IPAD geometry. IPAD has a canonical form for each entity and provides transformations to alternate forms; in particular, IPAD will provide a transformation to the ANSI standard. The DBMS schemas required to support IPAD geometry are explained.
Eddleston, Kimberly A; Veiga, John F; Powell, Gary N
2006-03-01
Using survey data from 400 managers, the authors examined whether gender self-schema would explain sex differences in preferences for status-based and socioemotional career satisfiers. Female gender self-schema, represented by femininity and family role salience, completely mediated the relationship between managers' sex and preferences for socioemotional career satisfiers. However, male gender self-schema, represented by masculinity and career role salience, did not mediate the relationship between managers' sex and preferences for status-based career satisfiers. As expected, male managers regarded status-based career satisfiers as more important and socioemotional career satisfiers as less important than female managers did. The proposed conceptualization of male and female gender self-schemas, which was supported by the data, enhances understanding of adult self-schema and work-related attitudes and behavior.
ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.
Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S
2004-01-01
ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.
Computer systems and methods for the query and visualization of multidimensional databases
Stolte, Chris; Tang, Diane L; Hanrahan, Patrick
2014-04-29
In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.
Computer systems and methods for the query and visualization of multidimensional databases
Stolte, Chris [Palo Alto, CA; Tang, Diane L [Palo Alto, CA; Hanrahan, Patrick [Portola Valley, CA
2011-02-01
In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.
Computer systems and methods for the query and visualization of multidimensional databases
Stolte, Chris [Palo Alto, CA; Tang, Diane L [Palo Alto, CA; Hanrahan, Patrick [Portola Valley, CA
2012-03-20
In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.
An Active RBSE Framework to Generate Optimal Stimulus Sequences in a BCI for Spelling
NASA Astrophysics Data System (ADS)
Moghadamfalahi, Mohammad; Akcakaya, Murat; Nezamfar, Hooman; Sourati, Jamshid; Erdogmus, Deniz
2017-10-01
A class of brain computer interfaces (BCIs) employs noninvasive recordings of electroencephalography (EEG) signals to enable users with severe speech and motor impairments to interact with their environment and social network. For example, EEG based BCIs for typing popularly utilize event related potentials (ERPs) for inference. Presentation paradigm design in current ERP-based letter by letter typing BCIs typically query the user with an arbitrary subset characters. However, the typing accuracy and also typing speed can potentially be enhanced with more informed subset selection and flash assignment. In this manuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) framework for inference and sequence optimization. Prior to presentation in each iteration, rather than showing a subset of randomly selected characters, the developed framework optimally selects a subset based on a query function. Selected queries are made adaptively specialized for users during each intent detection. Through a simulation-based study, we assess the effect of active-RBSE on the performance of a language-model assisted typing BCI in terms of typing speed and accuracy. To provide a baseline for comparison, we also utilize standard presentation paradigms namely, row and column matrix presentation paradigm and also random rapid serial visual presentation paradigms. The results show that utilization of active-RBSE can enhance the online performance of the system, both in terms of typing accuracy and speed.
Evaluating a NoSQL Alternative for Chilean Virtual Observatory Services
NASA Astrophysics Data System (ADS)
Antognini, J.; Araya, M.; Solar, M.; Valenzuela, C.; Lira, F.
2015-09-01
Currently, the standards and protocols for data access in the Virtual Observatory architecture (DAL) are generally implemented with relational databases based on SQL. In particular, the Astronomical Data Query Language (ADQL), language used by IVOA to represent queries to VO services, was created to satisfy the different data access protocols, such as Simple Cone Search. ADQL is based in SQL92, and has extra functionality implemented using PgSphere. An emergent alternative to SQL are the so called NoSQL databases, which can be classified in several categories such as Column, Document, Key-Value, Graph, Object, etc.; each one recommended for different scenarios. Within their notable characteristics we can find: schema-free, easy replication support, simple API, Big Data, etc. The Chilean Virtual Observatory (ChiVO) is developing a functional prototype based on the IVOA architecture, with the following relevant factors: Performance, Scalability, Flexibility, Complexity, and Functionality. Currently, it's very difficult to compare these factors, due to a lack of alternatives. The objective of this paper is to compare NoSQL alternatives with SQL through the implementation of a Web API REST that satisfies ChiVO's needs: a SESAME-style name resolver for the data from ALMA. Therefore, we propose a test scenario by configuring a NoSQL database with data from different sources and evaluating the feasibility of creating a Simple Cone Search service and its performance. This comparison will allow to pave the way for the application of Big Data databases in the Virtual Observatory.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Workplace-based assessment: raters' performance theories and constructs.
Govaerts, M J B; Van de Wiel, M W J; Schuwirth, L W T; Van der Vleuten, C P M; Muijtjens, A M M
2013-08-01
Weaknesses in the nature of rater judgments are generally considered to compromise the utility of workplace-based assessment (WBA). In order to gain insight into the underpinnings of rater behaviours, we investigated how raters form impressions of and make judgments on trainee performance. Using theoretical frameworks of social cognition and person perception, we explored raters' implicit performance theories, use of task-specific performance schemas and the formation of person schemas during WBA. We used think-aloud procedures and verbal protocol analysis to investigate schema-based processing by experienced (N = 18) and inexperienced (N = 16) raters (supervisor-raters in general practice residency training). Qualitative data analysis was used to explore schema content and usage. We quantitatively assessed rater idiosyncrasy in the use of performance schemas and we investigated effects of rater expertise on the use of (task-specific) performance schemas. Raters used different schemas in judging trainee performance. We developed a normative performance theory comprising seventeen inter-related performance dimensions. Levels of rater idiosyncrasy were substantial and unrelated to rater expertise. Experienced raters made significantly more use of task-specific performance schemas compared to inexperienced raters, suggesting more differentiated performance schemas in experienced raters. Most raters started to develop person schemas the moment they began to observe trainee performance. The findings further our understanding of processes underpinning judgment and decision making in WBA. Raters make and justify judgments based on personal theories and performance constructs. Raters' information processing seems to be affected by differences in rater expertise. The results of this study can help to improve rater training, the design of assessment instruments and decision making in WBA.
Advancing the LSST Operations Simulator
NASA Astrophysics Data System (ADS)
Saha, Abhijit; Ridgway, S. T.; Cook, K. H.; Delgado, F.; Chandrasekharan, S.; Petry, C. E.; Operations Simulator Group
2013-01-01
The Operations Simulator for the Large Synoptic Survey Telescope (LSST; http://lsst.org) allows the planning of LSST observations that obey explicit science driven observing specifications, patterns, schema, and priorities, while optimizing against the constraints placed by design-specific opto-mechanical system performance of the telescope facility, site specific conditions (including weather and seeing), as well as additional scheduled and unscheduled downtime. A simulation run records the characteristics of all observations (e.g., epoch, sky position, seeing, sky brightness) in a MySQL database, which can be queried for any desired purpose. Derivative information digests of the observing history database are made with an analysis package called Simulation Survey Tools for Analysis and Reporting (SSTAR). Merit functions and metrics have been designed to examine how suitable a specific simulation run is for several different science applications. This poster reports recent work which has focussed on an architectural restructuring of the code that will allow us to a) use "look-ahead" strategies that avoid cadence sequences that cannot be completed due to observing constraints; and b) examine alternate optimization strategies, so that the most efficient scheduling algorithm(s) can be identified and used: even few-percent efficiency gains will create substantive scientific opportunity. The enhanced simulator will be used to assess the feasibility of desired observing cadences, study the impact of changing science program priorities, and assist with performance margin investigations of the LSST system.
A knowledgebase system to enhance scientific discovery: Telemakus
Fuller, Sherrilynne S; Revere, Debra; Bugni, Paul F; Martin, George M
2004-01-01
Background With the rapid expansion of scientific research, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. Efforts to improve and speed up scientific discovery are being explored on a number of fronts. However, much of this work is based on traditional search and retrieval approaches and the bibliographic citation presentation format remains unchanged. Methods Case study. Results The Telemakus KnowledgeBase System provides flexible new tools for creating knowledgebases to facilitate retrieval and review of scientific research reports. In formalizing the representation of the research methods and results of scientific reports, Telemakus offers a potential strategy to enhance the scientific discovery process. While other research has demonstrated that aggregating and analyzing research findings across domains augments knowledge discovery, the Telemakus system is unique in combining document surrogates with interactive concept maps of linked relationships across groups of research reports. Conclusion Based on how scientists conduct research and read the literature, the Telemakus KnowledgeBase System brings together three innovations in analyzing, displaying and summarizing research reports across a domain: (1) research report schema, a document surrogate of extracted research methods and findings presented in a consistent and structured schema format which mimics the research process itself and provides a high-level surrogate to facilitate searching and rapid review of retrieved documents; (2) research findings, used to index the documents, allowing searchers to request, for example, research studies which have studied the relationship between neoplasms and vitamin E; and (3) visual exploration interface of linked relationships for interactive querying of research findings across the knowledgebase and graphical displays of what is known as well as, through gaps in the map, what is yet to be tested. The rationale and system architecture are described and plans for the future are discussed. PMID:15507158
A multi-site cognitive task analysis for biomedical query mediation.
Hruby, Gregory W; Rasmussen, Luke V; Hanauer, David; Patel, Vimla L; Cimino, James J; Weng, Chunhua
2016-09-01
To apply cognitive task analyses of the Biomedical query mediation (BQM) processes for EHR data retrieval at multiple sites towards the development of a generic BQM process model. We conducted semi-structured interviews with eleven data analysts from five academic institutions and one government agency, and performed cognitive task analyses on their BQM processes. A coding schema was developed through iterative refinement and used to annotate the interview transcripts. The annotated dataset was used to reconstruct and verify each BQM process and to develop a harmonized BQM process model. A survey was conducted to evaluate the face and content validity of this harmonized model. The harmonized process model is hierarchical, encompassing tasks, activities, and steps. The face validity evaluation concluded the model to be representative of the BQM process. In the content validity evaluation, out of the 27 tasks for BQM, 19 meet the threshold for semi-valid, including 3 fully valid: "Identify potential index phenotype," "If needed, request EHR database access rights," and "Perform query and present output to medical researcher", and 8 are invalid. We aligned the goals of the tasks within the BQM model with the five components of the reference interview. The similarity between the process of BQM and the reference interview is promising and suggests the BQM tasks are powerful for eliciting implicit information needs. We contribute a BQM process model based on a multi-site study. This model promises to inform the standardization of the BQM process towards improved communication efficiency and accuracy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A Multi-Site Cognitive Task Analysis for Biomedical Query Mediation
Hruby, Gregory W.; Rasmussen, Luke V.; Hanauer, David; Patel, Vimla; Cimino, James J.; Weng, Chunhua
2016-01-01
Objective To apply cognitive task analyses of the Biomedical query mediation (BQM) processes for EHR data retrieval at multiple sites towards the development of a generic BQM process model. Materials and Methods We conducted semi-structured interviews with eleven data analysts from five academic institutions and one government agency, and performed cognitive task analyses on their BQM processes. A coding schema was developed through iterative refinement and used to annotate the interview transcripts. The annotated dataset was used to reconstruct and verify each BQM process and to develop a harmonized BQM process model. A survey was conducted to evaluate the face and content validity of this harmonized model. Results The harmonized process model is hierarchical, encompassing tasks, activities, and steps. The face validity evaluation concluded the model to be representative of the BQM process. In the content validity evaluation, out of the 27 tasks for BQM, 19 meet the threshold for semi-valid, including 3 fully valid: “Identify potential index phenotype,” “If needed, request EHR database access rights,” and “Perform query and present output to medical researcher”, and 8 are invalid. Discussion We aligned the goals of the tasks within the BQM model with the five components of the reference interview. The similarity between the process of BQM and the reference interview is promising and suggests the BQM tasks are powerful for eliciting implicit information needs. Conclusions We contribute a BQM process model based on a multi-site study. This model promises to inform the standardization of the BQM process towards improved communication efficiency and accuracy. PMID:27435950
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Yubin; Shankar, Mallikarjun; Park, Byung H.
Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcaremore » RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed 3NF Equivalent Graph (3EG) transformation.We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.« less
Mining Longitudinal Web Queries: Trends and Patterns.
ERIC Educational Resources Information Center
Wang, Peiling; Berry, Michael W.; Yang, Yiheng
2003-01-01
Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…
A two-level cache for distributed information retrieval in search engines.
Zhang, Weizhe; He, Hui; Ye, Jianwei
2013-01-01
To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache.
A Two-Level Cache for Distributed Information Retrieval in Search Engines
Zhang, Weizhe; He, Hui; Ye, Jianwei
2013-01-01
To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache. PMID:24363621
Sundvall, Erik; Wei-Kleiner, Fang; Freire, Sergio M; Lambrix, Patrick
2017-01-01
Archetype-based Electronic Health Record (EHR) systems using generic reference models from e.g. openEHR, ISO 13606 or CIMI should be easy to update and reconfigure with new types (or versions) of data models or entries, ideally with very limited programming or manual database tweaking. Exploratory research (e.g. epidemiology) leading to ad-hoc querying on a population-wide scale can be a challenge in such environments. This publication describes implementation and test of an archetype-aware Dewey encoding optimization that can be used to produce such systems in environments supporting relational operations, e.g. RDBMs and distributed map-reduce frameworks like Hadoop. Initial testing was done using a nine-node 2.2 GHz quad-core Hadoop cluster querying a dataset consisting of targeted extracts from 4+ million real patient EHRs, query results with sub-minute response time were obtained.
Combining Model-driven and Schema-based Program Synthesis
NASA Technical Reports Server (NTRS)
Denney, Ewen; Whittle, John
2004-01-01
We describe ongoing work which aims to extend the schema-based program synthesis paradigm with explicit models. In this context, schemas can be considered as model-to-model transformations. The combination of schemas with explicit models offers a number of advantages, namely, that building synthesis systems becomes much easier since the models can be used in verification and in adaptation of the synthesis systems. We illustrate our approach using an example from signal processing.
Nadkarni, P M
1997-08-01
Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H
2012-11-06
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.
2013-01-01
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719
Schema-Based Text Comprehension
ERIC Educational Resources Information Center
Ensar, Ferhat
2015-01-01
Schema is one of the most common terms used for classifying and constructing knowledge. Therefore, a schema is a pre-planned set of concepts. It usually contains social information and is used to represent chain of events, perceptions, situations, relationships and even objects. For example, Kant initially defines the idea of schema as some…
ODM2 (Observation Data Model): The EarthChem Use Case
NASA Astrophysics Data System (ADS)
Lehnert, Kerstin; Song, Lulin; Hsu, Leslie; Horsburgh, Jeffrey S.; Aufdenkampe, Anthony K.; Mayorga, Emilio; Tarboton, David; Zaslavsky, Ilya
2014-05-01
PetDB is an online data system that was created in the late 1990's to serve online a synthesis of published geochemical and petrological data of igneous and metamorphic rocks. PetDB has today reached a volume of 2.5 million analytical values for nearly 70,000 rock samples. PetDB's data model (Lehnert et al., G-Cubed 2000) was designed to store sample-based observational data generated by the analysis of rocks, together with a wide range of metadata documenting provenance of the samples, analytical procedures, data quality, and data source. Attempts to store additional types of geochemical data such as time-series data of seafloor hydrothermal springs and volcanic gases, depth-series data for marine sediments and soils, and mineral or mineral inclusion data revealed the limitations of the schema: the inability to properly record sample hierarchies (for example, a garnet that is included in a diamond that is included in a xenolith that is included in a kimberlite rock sample), inability to properly store time-series data, inability to accommodate classification schemes other than rock lithologies, deficiencies of identifying and documenting datasets that are not part of publications. In order to overcome these deficiencies, PetDB has been developing a new data schema using the ODM2 information model (ODM=Observation Data Model). The development of ODM2 is a collaborative project that leverages the experience of several existing information representations, including PetDB and EarthChem, and the CUAHSI HIS Observations Data Model (ODM), as well as the general specification for encoding observational data called Observations and Measurements (O&M) to develop a uniform information model that seamlessly manages spatially discrete, feature-based earth observations from environmental samples and sample fractions as well as in-situ sensors, and to test its initial implementation in a variety of user scenarios. The O&M model, adopted as an international standard by the Open Geospatial Consortium, and later by ISO, is the foundation of several domain markup languages such as OGC WaterML 2, used for exchanging hydrologic time series. O&M profiles for samples and sample fractions have not been standardized yet, and there is a significant variety in sample data representations used across agencies and academic projects. The intent of the ODM2 project is to create a unified relational representation for different types of spatially discrete observational data, ensuring that the data can be efficiently stored, transferred, catalogued and queried within a variety of earth science applications. We will report on the initial design and implementation of the new model for PetDB, and results of testing the model against a set of common queries. We have explored several aspects of the model, including: semantic consistency, validation and integrity checking, portability and maintainability, query efficiency, and scalability. The sample datasets from PetDB have been loaded in the initial physical implementation for testing. The results of the experiments point to both benefits and challenges of the initial design, and illustrate the key trade-off between the generality of design, ease of interpretation, and query efficiency, especially as the system needs to scale to millions of records.
Young Adolescents' Gender-, Ethnicity-, and Popularity-Based Social Schemas of Aggressive Behavior
ERIC Educational Resources Information Center
Clemans, Katherine H.; Graber, Julia A.
2016-01-01
Social schemas can influence the perception and recollection of others' behavior and may create biases in the reporting of social events. This study investigated young adolescents' (N = 317) gender-, ethnicity-, and popularity-based social schemas of overtly and relationally aggressive behavior. Results indicated that participants associated overt…
Beacon- and Schema-Based Method for Recognizing Algorithms from Students' Source Code
ERIC Educational Resources Information Center
Taherkhani, Ahmad; Malmi, Lauri
2013-01-01
In this paper, we present a method for recognizing algorithms from students programming submissions coded in Java. The method is based on the concept of "programming schemas" and "beacons". Schemas are high-level programming knowledge with detailed knowledge abstracted out, and beacons are statements that imply specific…
Query construction, entropy, and generalization in neural-network models
NASA Astrophysics Data System (ADS)
Sollich, Peter
1994-05-01
We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics
Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.
2012-01-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
A Systematic Review of Serious Games in Training Health Care Professionals.
Wang, Ryan; DeMaria, Samuel; Goldberg, Andrew; Katz, Daniel
2016-02-01
Serious games are computer-based games designed for training purposes. They are poised to expand their role in medical education. This systematic review, conducted in accordance with PRISMA guidelines, aimed to synthesize current serious gaming trends in health care training, especially those pertaining to developmental methodologies and game evaluation. PubMed, EMBASE, and Cochrane databases were queried for relevant documents published through December 2014. Of the 3737 publications identified, 48 of them, covering 42 serious games, were included. From 2007 to 2014, they demonstrate a growth from 2 games and 2 genres to 42 games and 8 genres. Overall, study design was heterogeneous and methodological quality by MERQSI score averaged 10.5/18, which is modest. Seventy-nine percent of serious games were evaluated for training outcomes. As the number of serious games for health care training continues to grow, having schemas that organize how educators approach their development and evaluation is essential for their success.
Pinciroli, F; Combi, C; Pozzi, G
1995-02-01
Use of data base techniques to store medical records has been going on for more than 40 years. Some aspects still remain unresolved, e.g., the management of textual data and image data within a single system. Object-orientation techniques applied to a database management system (DBMS) allow the definition of suitable data structures (e.g., to store digital images): some facilities allow the use of predefined structures when defining new ones. Currently available object-oriented DBMS, however, still need improvements both in the schema update and in the query facilities. This paper describes a prototype of a medical record that includes some multimedia features, managing both textual and image data. The prototype here described considers data from the medical records of patients subjected to percutaneous transluminal coronary artery angioplasty. We developed it on a Sun workstation with a Unix operating system and ONTOS as an object-oriented DBMS.
Gmz: a Gml Compression Model for Webgis
NASA Astrophysics Data System (ADS)
Khandelwal, A.; Rajan, K. S.
2017-09-01
Geography markup language (GML) is an XML specification for expressing geographical features. Defined by Open Geospatial Consortium (OGC), it is widely used for storage and transmission of maps over the Internet. XML schemas provide the convenience to define custom features profiles in GML for specific needs as seen in widely popular cityGML, simple features profile, coverage, etc. Simple features profile (SFP) is a simpler subset of GML profile with support for point, line and polygon geometries. SFP has been constructed to make sure it covers most commonly used GML geometries. Web Feature Service (WFS) serves query results in SFP by default. But it falls short of being an ideal choice due to its high verbosity and size-heavy nature, which provides immense scope for compression. GMZ is a lossless compression model developed to work for SFP compliant GML files. Our experiments indicate GMZ achieves reasonably good compression ratios and can be useful in WebGIS based applications.
A Services-Oriented Architecture for Water Observations Data
NASA Astrophysics Data System (ADS)
Maidment, D. R.; Zaslavsky, I.; Valentine, D.; Tarboton, D. G.; Whitenack, T.; Whiteaker, T.; Hooper, R.; Kirschtel, D.
2009-04-01
Water observations data are time series of measurements made at point locations of water level, flow, and quality and corresponding data for climatic observations at point locations such as gaged precipitation and weather variables. A services-oriented architecture has been built for such information for the United States that has three components: hydrologic information servers, hydrologic information clients, and a centralized metadata cataloging system. These are connected using web services for observations data and metadata defined by an XML-based language called WaterML. A Hydrologic Information Server can be built by storing observations data in a relational database schema in the CUAHSI Observations Data Model, in which case, web services access to the data and metadata is automatically provided by query functions for WaterML that are wrapped around the relational database within a web server. A Hydrologic Information Server can also be constructed by custom-programming an interface to an existing water agency web site so that responds to the same queries by producing data in WaterML as do the CUAHSI Observations Data Model based servers. A Hydrologic Information Client is one which can interpret and ingest WaterML metadata and data. We have two client applications for Excel and ArcGIS and have shown how WaterML web services can be ingested into programming environments such as Matlab and Visual Basic. HIS Central, maintained at the San Diego Supercomputer Center is a repository of observational metadata for WaterML web services which presently indexes 342 million data measured at 1.75 million locations. This is the largest catalog water observational data for the United States presently in existence. As more observation networks join what we term "CUAHSI Water Data Federation", and the system accommodates a growing number of sites, measured parameters, applications, and users, rapid and reliable access to large heterogeneous hydrologic data repositories becomes critical. The CUAHSI HIS solution to the scalability and heterogeneity challenges has several components. Structural differences across the data repositories are addressed by building a standard services foundation for the exchange of hydrologic data, as derived from a common information model for observational data measured at stationary points and its implementation as a relational schema (ODM) and an XML schema (WaterML). Semantic heterogeneity is managed by mapping water quantity, water quality, and other parameters collected by government agencies and academic projects to a common ontology. The WaterML-compliant web services are indexed in a community services registry called HIS Central (hiscentral.cuahsi.org). Once a web service is registered in HIS Central, its metadata (site and variable characteristics, period of record for each variable at each site, etc.) is harvested and appended to the central catalog. The catalog is further updated as the service publisher associates the variables in the published service with ontology concepts. After this, the newly published service becomes available for spatial and semantics-based queries from online and desktop client applications developed by the project. Hydrologic system server software is now deployed at more than a dozen locations in the United States and Australia. To provide rapid access to data summaries, in particular for several nation-wide data repositories including EPA STORET, USGS NWIS, and USDA SNOTEL, we convert the observation data catalogs and databases with harvested data values into special representations that support high-performance analysis and visualization. The construction of OLAP (Online Analytical Processing) cubes, often called data cubes, is an approach to organizing and querying large multi-dimensional data collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005/2008, to the analysis of the catalogs from several agencies. OLAP analysis results reflect geography and history of observation data availability from USGS NWIS, EPA STORET, and USDA SNOTEL repositories, and spatial and temporal dynamics of the available measurements for several key nutrient-related parameters. Our experience developing the CUAHSI HIS cyberinfrastructure demonstrated that efficient integration of hydrologic observations from multiple government and academic sources requires a range of technical approaches focused on managing different components of data heterogeneity and system scalability. While this submission addresses technical aspects of developing a national-scale information system for hydrologic observations, the challenges of explicating shared semantics of hydrologic observations and building a community of HIS users and developers remain critical in constructing a nation-wide federation of water data services.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2016-01-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses
Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan
2014-01-01
With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048
Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.
NASA Astrophysics Data System (ADS)
Stallard, A. P.; Smith, S. R.; Elya, J. L.
2016-12-01
The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.
KaBOB: ontology-based semantic integration of biomedical databases.
Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E
2015-04-23
The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.
A Natural Teaching Method Based on Learning Theory.
ERIC Educational Resources Information Center
Smilkstein, Rita
1991-01-01
The natural teaching method is active and student-centered, based on schema and constructivist theories, and informed by research in neuroplasticity. A schema is a mental picture or understanding of something we have learned. Humans can have knowledge only to the degree to which they have constructed schemas from learning experiences and practice.…
The BioPrompt-box: an ontology-based clustering tool for searching in biological databases.
Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto
2007-03-08
High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank--like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL.
The BioPrompt-box: an ontology-based clustering tool for searching in biological databases
Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto
2007-01-01
Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Conclusion Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL. PMID:17430575
Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki
2008-09-01
A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.
The Demonstrator for the European Plate Observing System (EPOS)
NASA Astrophysics Data System (ADS)
Hoffmann, T. L.; Euteneuer, F.; Ulbricht, D.; Lauterjung, J.; Bailo, D.; Jeffery, K. G.
2014-12-01
An important outcome of the 4-year Preparatory Phase of the ESFRI project European Plate Observing System (EPOS) was the development and first implementation of the EPOS Demonstrator by the project's ICT Working Group 7. The Demonstrator implements the vertical integration of the three-layer architectural scheme for EPOS, connecting the Integrated Core Services (ICS), Thematic Core Services (TCS) and the National Research Infrastructures (NRI). The demonstrator provides a single GUI with central key discovery and query functionalities, based on already existing services by the seismic, geologic and geodetic communities. More specifically the seismic services of the Demonstrator utilize webservices and APIs for data and discovery of raw seismic data (FDSN webservices by the EIDA Network), events (Geoportal by EMSC) and analytical data products (e.g., hazard maps by EFEHR via OGC WMS). For geologic services, the EPOS Demonstrator accesses OneGeology Europe which serves the community with geologic maps and point information via OGC webservices. The Demonstrator also provides access to raw geodetic data via a newly developed universal tool called GSAC. The Demonstrator itself resembles the future Integrated Core Service (ICS) and provides direct access to the end user. Its core functionality lies in a metadata catalogue, which serves as the central information hub and stores information about all RIs, related persons, projects, financial background and technical access information. The database schema of the catalogue is based on CERIF, which has been slightly adapted. Currently, the portal provides basic query functions as well as cross domain search. [www.epos.cineca.it
Was Your Glass Left Half Full? Family Dynamics and Optimism
ERIC Educational Resources Information Center
Buri, John R.; Gunty, Amy
2008-01-01
Students' levels of a frequently studied adaptive schema (optimism) as a function of parenting variables (parental authority, family intrusiveness, parental overprotection, parentification, parental psychological control, and parental nurturance) were investigated. Results revealed that positive parenting styles were positively related to the…
Mungall, Christopher J; Emmert, David B
2007-07-01
A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).
ERIC Educational Resources Information Center
Gutkind, Rebeka Chaia
2012-01-01
This mixed method study investigated the schema strategy uses of fourth-grade boys with reading challenges; specifically, their ability to understand text based on two components within schema theory: tuning and restructuring. Based on the reading comprehension scores from the Iowa Test of Basic Skills (Form 2010), four comparison groups were…
Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery
Shen, Feichen; Liu, Hongfang; Sohn, Sunghwan; Larson, David W.; Lee, Yugyung
2017-01-01
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic. PMID:28983419
Optimal Weight Assignment for a Chinese Signature File.
ERIC Educational Resources Information Center
Liang, Tyne; And Others
1996-01-01
Investigates the performance of a character-based Chinese text retrieval scheme in which monogram keys and bigram keys are encoded into document signatures. Tests and verifies the theoretical predictions of the optimal weight assignments and the minimal false hit rate in experiments using a real Chinese corpus for disyllabic queries of different…
Shark: SQL and Analytics with Cost-Based Query Optimization on Coarse-Grained Distributed Memory
2014-01-13
RDBMS and contains a database (often MySQL or Derby) with a namespace for tables, table metadata and partition information. Table data is stored in an...serialization/deserialization) Java interface implementations with corresponding object inspectors. The Hive driver controls the processing of queries, coordinat...native API, RDD operations are invoked through a functional interface similar to DryadLINQ [32] in Scala, Java or Python. For example, the Scala code for
Langer, Steve G
2016-06-01
In 2010, the DICOM Data Warehouse (DDW) was launched as a data warehouse for DICOM meta-data. Its chief design goals were to have a flexible database schema that enabled it to index standard patient and study information, modality specific tags (public and private), and create a framework to derive computable information (derived tags) from the former items. Furthermore, it was to map the above information to an internally standard lexicon that enables a non-DICOM savvy programmer to write standard SQL queries and retrieve the equivalent data from a cohort of scanners, regardless of what tag that data element was found in over the changing epochs of DICOM and ensuing migration of elements from private to public tags. After 5 years, the original design has scaled astonishingly well. Very little has changed in the database schema. The knowledge base is now fluent in over 90 device types. Also, additional stored procedures have been written to compute data that is derivable from standard or mapped tags. Finally, an early concern is that the system would not be able to address the variability DICOM-SR objects has been addressed. As of this writing the system is indexing 300 MR, 600 CT, and 2000 other (XA, DR, CR, MG) imaging studies per day. The only remaining issue to be solved is the case for tags that were not prospectively indexed-and indeed, this final challenge may lead to a noSQL, big data, approach in a subsequent version.
YAHA: fast and flexible long-read alignment with optimal breakpoint detection.
Faust, Gregory G; Hall, Ira M
2012-10-01
With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. imh4y@virginia.edu.
2012-01-01
Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. PMID:22536971
Exploration into the Effects of the Schema-Based Instruction: A Bottom-Up Approach
ERIC Educational Resources Information Center
Fujii, Kazuma
2016-01-01
The purpose of this paper is to explore the effective use of the core schema-based instruction (SBI) in a classroom setting. The core schema is a schematic representation of the common underlying meaning of a given lexical item, and was first proposed on the basis of the cognitive linguistic perspectives by the Japanese applied linguists Tanaka,…
Datacube Services in Action, Using Open Source and Open Standards
NASA Astrophysics Data System (ADS)
Baumann, P.; Misev, D.
2016-12-01
Array Databases comprise novel, promising technology for massive spatio-temporal datacubes, extending the SQL paradigm of "any query, anytime" to n-D arrays. On server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. The rasdaman ("raster data manager") system, which has pioneered Array Databases, is available in open source on www.rasdaman.org. Its declarative query language extends SQL with array operators which are optimized and parallelized on server side. The rasdaman engine, which is part of OSGeo Live, is mature and in operational use databases individually holding dozens of Terabytes. Further, the rasdaman concepts have strongly impacted international Big Data standards in the field, including the forthcoming MDA ("Multi-Dimensional Array") extension to ISO SQL, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards, and the forthcoming INSPIRE WCS/WCPS; in both OGC and INSPIRE, OGC is WCS Core Reference Implementation. In our talk we present concepts, architecture, operational services, and standardization impact of open-source rasdaman, as well as experiences made.
Spatial cyberinfrastructures, ontologies, and the humanities.
Sieber, Renee E; Wellen, Christopher C; Jin, Yuan
2011-04-05
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
Development of XML Schema for Broadband Digital Seismograms and Data Center Portal
NASA Astrophysics Data System (ADS)
Takeuchi, N.; Tsuboi, S.; Ishihara, Y.; Nagao, H.; Yamagishi, Y.; Watanabe, T.; Yanaka, H.; Yamaji, H.
2008-12-01
There are a number of data centers around the globe, where the digital broadband seismograms are opened to researchers. Those centers use their own user interfaces and there are no standard to access and retrieve seismograms from different data centers using unified interface. One of the emergent technologies to realize unified user interface for different data centers is the concept of WebService and WebService portal. Here we have developed a prototype of data center portal for digital broadband seismograms. This WebService portal uses WSDL (Web Services Description Language) to accommodate differences among the different data centers. By using the WSDL, alteration and addition of data center user interfaces can be easily managed. This portal, called NINJA Portal, assumes three WebServices: (1) database Query service, (2) Seismic event data request service, and (3) Seismic continuous data request service. Current system supports both station search of database Query service and seismic continuous data request service. Data centers supported by this NINJA portal will be OHP data center in ERI and Pacific21 data center in IFREE/JAMSTEC in the beginning. We have developed metadata standard for seismological data based on QuakeML for parametric data, which has been developed by ETH Zurich, and XML-SEED for waveform data, which was developed by IFREE/JAMSTEC. The prototype of NINJA portal is now released through IFREE web page (http://www.jamstec.go.jp/pacific21/).
Towards the XML schema measurement based on mapping between XML and OO domain
NASA Astrophysics Data System (ADS)
Rakić, Gordana; Budimac, Zoran; Heričko, Marjan; Pušnik, Maja
2017-07-01
Measuring quality of IT solutions is a priority in software engineering. Although numerous metrics for measuring object-oriented code already exist, measuring quality of UML models or XML Schemas is still developing. One of the research questions in the overall research leaded by ideas described in this paper is whether we can apply already defined object-oriented design metrics on XML schemas based on predefined mappings. In this paper, basic ideas for mentioned mapping are presented. This mapping is prerequisite for setting the future approach to XML schema quality measuring with object-oriented metrics.
Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung
2015-01-01
Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665
XML Schema Guide for Primary CDR Submissions
This document presents the extensible markup language (XML) schema guide for the Office of Pollution Prevention and Toxics’ (OPPT) e-CDRweb tool. E-CDRweb is the electronic, web-based tool provided by Environmental Protection Agency (EPA) for the submission of Chemical Data Reporting (CDR) information. This document provides the user with tips and guidance on correctly using the version 1.7 XML schema. Please note that the order of the elements must match the schema.
Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems
2013-01-01
as follows: —Let kall be the k value for the SNDOP-ALL query and for each SNDOP query i, let ki be the k for that query. For each query i, set ki... kall − 1. —Number each element of vi ∈ V such that gI(vi) and V C(vi) are true. For the ith SNDOP query, let vi be the corresponding element of V —Let...vertices of S. PROOF. We set up |V | SNDOP-queries as follows: —Let kall be the k value for the SNDOP-ALL query and and for each SNDOP-query i, let ki be
1991-02-01
3 2.2 Hybrid Rule/Fact Schemas .............................................................. 3 3 THE LIMITATIONS OF RULE BASED KNOWLEDGE...or hybrid rule/fact schemas. 2 UNCLASSIFIED .WA UNCLASSIFIED ERL-0520-RR 2.1 Propositional Logic The simplest form of production-rules are based upon...requirements which may lead to poor system performance. 2.2 Hybrid Rule/Fact Schemas Hybrid rule/fact relationships (also known as Predicate Calculus ) have
Schematic memory components converge within angular gyrus during retrieval
Wagner, Isabella C; van Buuren, Mariët; Kroes, Marijn CW; Gutteling, Tjerk P; van der Linden, Marieke; Morris, Richard G; Fernández, Guillén
2015-01-01
Mental schemas form associative knowledge structures that can promote the encoding and consolidation of new and related information. Schemas are facilitated by a distributed system that stores components separately, presumably in the form of inter-connected neocortical representations. During retrieval, these components need to be recombined into one representation, but where exactly such recombination takes place is unclear. Thus, we asked where different schema components are neuronally represented and converge during retrieval. Subjects acquired and retrieved two well-controlled, rule-based schema structures during fMRI on consecutive days. Schema retrieval was associated with midline, medial-temporal, and parietal processing. We identified the multi-voxel representations of different schema components, which converged within the angular gyrus during retrieval. Critically, convergence only happened after 24-hour-consolidation and during a transfer test where schema material was applied to novel but related trials. Therefore, the angular gyrus appears to recombine consolidated schema components into one memory representation. DOI: http://dx.doi.org/10.7554/eLife.09668.001 PMID:26575291
Brasfield, Hope; Anderson, Scott; Stuart, Gregory L.
2014-01-01
Recent research has examined the relation between mindfulness and substance use, demonstrating that lower trait mindfulness is associated with increased substance use, and that mindfulness-based interventions help to reduce substance use. Research has also demonstrated that early maladaptive schemas are prevalent among individuals seeking substance use treatment and that targeting early maladaptive schemas in treatment may improve outcomes. However, no known research has examined the relation between mindfulness and early maladaptive schemas despite theoretical and empirical reasons to suspect their association. Therefore, the current study examined the relation between trait mindfulness and early maladaptive schemas among adult men seeking residential substance abuse treatment (N = 82). Findings demonstrated strong negative associations between trait mindfulness and 15 of the 18 early maladaptive schemas. Moreover, men endorsing multiple early maladaptive schemas reported lower trait mindfulness than men with fewer early maladaptive schemas. The implications of these findings for future research and treatment are discussed. PMID:26085852
Modeling the Arden Syntax for medical decisions in XML.
Kim, Sukil; Haug, Peter J; Rocha, Roberto A; Choi, Inyoung
2008-10-01
A new model expressing Arden Syntax with the eXtensible Markup Language (XML) was developed to increase its portability. Every example was manually parsed and reviewed until the schema and the style sheet were considered to be optimized. When the first schema was finished, several MLMs in Arden Syntax Markup Language (ArdenML) were validated against the schema. They were then transformed to HTML formats with the style sheet, during which they were compared to the original text version of their own MLM. When faults were found in the transformed MLM, the schema and/or style sheet was fixed. This cycle continued until all the examples were encoded into XML documents. The original MLMs were encoded in XML according to the proposed XML schema and reverse-parsed MLMs in ArdenML were checked using a public domain Arden Syntax checker. Two hundred seventy seven examples of MLMs were successfully transformed into XML documents using the model, and the reverse-parse yielded the original text version of MLMs. Two hundred sixty five of the 277 MLMs showed the same error patterns before and after transformation, and all 11 errors related to statement structure were resolved in XML version. The model uses two syntax checking mechanisms, first an XML validation process, and second, a syntax check using an XSL style sheet. Now that we have a schema for ArdenML, we can also begin the development of style sheets for transformation ArdenML into other languages.
Schema generation in recurrent neural nets for intercepting a moving target.
Fleischer, Andreas G
2010-06-01
The grasping of a moving object requires the development of a motor strategy to anticipate the trajectory of the target and to compute an optimal course of interception. During the performance of perception-action cycles, a preprogrammed prototypical movement trajectory, a motor schema, may highly reduce the control load. Subjects were asked to hit a target that was moving along a circular path by means of a cursor. Randomized initial target positions and velocities were detected in the periphery of the eyes, resulting in a saccade toward the target. Even when the target disappeared, the eyes followed the target's anticipated course. The Gestalt of the trajectories was dependent on target velocity. The prediction capability of the motor schema was investigated by varying the visibility range of cursor and target. Motor schemata were determined to be of limited precision, and therefore visual feedback was continuously required to intercept the moving target. To intercept a target, the motor schema caused the hand to aim ahead and to adapt to the target trajectory. The control of cursor velocity determined the point of interception. From a modeling point of view, a neural network was developed that allowed the implementation of a motor schema interacting with feedback control in an iterative manner. The neural net of the Wilson type consists of an excitation-diffusion layer allowing the generation of a moving bubble. This activation bubble runs down an eye-centered motor schema and causes a planar arm model to move toward the target. A bubble provides local integration and straightening of the trajectory during repetitive moves. The schema adapts to task demands by learning and serves as forward controller. On the basis of these model considerations the principal problem of embedding motor schemata in generalized control strategies is discussed.
Bayen, Ute J.; Kuhlmann, Beatrice G.
2010-01-01
The authors investigated conditions under which judgments in source-monitoring tasks are influenced by prior schematic knowledge. According to a probability-matching account of source guessing (Spaniol & Bayen, 2002), when people do not remember the source of information, they match source guessing probabilities to the perceived contingency between sources and item types. When they do not have a representation of a contingency, they base their guesses on prior schematic knowledge. The authors provide support for this account in two experiments with sources presenting information that was expected for one source and somewhat unexpected for another. Schema-relevant information about the sources was provided at the time of encoding. When contingency perception was impeded by dividing attention, participants showed schema-based guessing (Experiment 1). Manipulating source - item contingency also affected guessing (Experiment 2). When this contingency was schema-inconsistent, it superseded schema-based expectations and led to schema-inconsistent guessing. PMID:21603251
XML Schema Guide for Secondary CDR Submissions
This document presents the extensible markup language (XML) schema guide for the Office of Pollution Prevention and Toxics’ (OPPT) e-CDRweb tool. E-CDRweb is the electronic, web-based tool provided by Environmental Protection Agency (EPA) for the submission of Chemical Data Reporting (CDR) information. This document provides the user with tips and guidance on correctly using the version 1.1 XML schema for the Joint Submission Form. Please note that the order of the elements must match the schema.
1990-09-12
electronics reading to the next. To test this hypothesis and the suitability of EBL to acquiring schemas, I have implemented an automated reader/learner as...used. For example, testing the utility of a kidnapping schema using several readings about kidnapping can only go so far toward establishing the...the cost of carrying the new rules while processing unrelated material will be underestimated. The present research tests the utility of new schemas in
Schema representation in patients with ventromedial PFC lesions.
Ghosh, Vanessa E; Moscovitch, Morris; Melo Colella, Brenda; Gilboa, Asaf
2014-09-03
Human neuroimaging and animal studies have recently implicated the ventromedial prefrontal cortex (vmPFC) in memory schema, particularly in facilitating new encoding by existing schemas. In humans, the most conspicuous memory disorder following vmPFC damage is confabulation; strategic retrieval models suggest that aberrant schema activation or reinstatement plays a role in confabulation. This raises the possibility that beyond its role in schema-supported memory encoding, the vmPFC is also implicated in schema reinstatement itself. If that is the case, vmPFC lesions should lead to impaired schema-based operations, even on tasks that do not involve memory acquisition. To test this prediction, ten patients with vmPFC damage, four with present or prior confabulation, and a group of twelve matched healthy controls made speeded yes/no decisions as to whether words were closely related to a schema (a visit to the doctor). Ten minutes later, they repeated the task for a new schema (going to bed) with some words related to the first schema included as lures. Last, they rated the degree to which stimuli were related to the second schema. All four vmPFC patients with present or prior confabulation were impaired in rejecting lures and in classifying stimulus belongingness to the schema, even when they were not lures. Nonconfabulating patients performed comparably to healthy adults with high accuracy, comparable reaction times, and similar ratings. These results show for the first time that damage to the human vmPFC, when associated with confabulation, leads to deficient schema reinstatement, which is likely a prerequisite for schema-mediated memory integration. Copyright © 2014 the authors 0270-6474/14/3412057-14$15.00/0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng
2007-08-02
An index in a database system is a data structure that utilizes redundant information about the base data to speed up common searching and retrieval operations. Most commonly used indexes are variants of B-trees, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes call compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations by sacrificing the efficiency of updating the indexes after the modification of an individual record. In addition to the well-known strengths of bitmap indexes, FastBit has a special strength stemming from the bitmap compression scheme used. Themore » compression method is called the Word-Aligned Hybrid (WAH) code. It reduces the bitmap indexes to reasonable sizes and at the same time allows very efficient bitwise logical operations directly on the compressed bitmaps. Compared with the well-known compression methods such as LZ77 and Byte-aligned Bitmap code (BBC), WAH sacrifices some space efficiency for a significant improvement in operational efficiency. Since the bitwise logical operations are the most important operations needed to answer queries, using WAH compression has been shown to answer queries significantly faster than using other compression schemes. Theoretical analyses showed that WAH compressed bitmap indexes are optimal for one-dimensional range queries. Only the most efficient indexing schemes such as B+-tree and B*-tree have this optimality property. However, bitmap indexes are superior because they can efficiently answer multi-dimensional range queries by combining the answers to one-dimensional queries.« less
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.
Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R
2012-07-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
Extending TOPS: Ontology-driven Anomaly Detection and Analysis System
NASA Astrophysics Data System (ADS)
Votava, P.; Nemani, R. R.; Michaelis, A.
2010-12-01
Terrestrial Observation and Prediction System (TOPS) is a flexible modeling software system that integrates ecosystem models with frequent satellite and surface weather observations to produce ecosystem nowcasts (assessments of current conditions) and forecasts useful in natural resources management, public health and disaster management. We have been extending the Terrestrial Observation and Prediction System (TOPS) to include a capability for automated anomaly detection and analysis of both on-line (streaming) and off-line data. In order to best capture the knowledge about data hierarchies, Earth science models and implied dependencies between anomalies and occurrences of observable events such as urbanization, deforestation, or fires, we have developed an ontology to serve as a knowledge base. We can query the knowledge base and answer questions about dataset compatibilities, similarities and dependencies so that we can, for example, automatically analyze similar datasets in order to verify a given anomaly occurrence in multiple data sources. We are further extending the system to go beyond anomaly detection towards reasoning about possible causes of anomalies that are also encoded in the knowledge base as either learned or implied knowledge. This enables us to scale up the analysis by eliminating a large number of anomalies early on during the processing by either failure to verify them from other sources, or matching them directly with other observable events without having to perform an extensive and time-consuming exploration and analysis. The knowledge is captured using OWL ontology language, where connections are defined in a schema that is later extended by including specific instances of datasets and models. The information is stored using Sesame server and is accessible through both Java API and web services using SeRQL and SPARQL query languages. Inference is provided using OWLIM component integrated with Sesame.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Zhenhuan; Boyuka, David; Zou, X
Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less
An XML-based interchange format for genotype-phenotype data.
Whirl-Carrillo, M; Woon, M; Thorn, C F; Klein, T E; Altman, R B
2008-02-01
Recent advances in high-throughput genotyping and phenotyping have accelerated the creation of pharmacogenomic data. Consequently, the community requires standard formats to exchange large amounts of diverse information. To facilitate the transfer of pharmacogenomics data between databases and analysis packages, we have created a standard XML (eXtensible Markup Language) schema that describes both genotype and phenotype data as well as associated metadata. The schema accommodates information regarding genes, drugs, diseases, experimental methods, genomic/RNA/protein sequences, subjects, subject groups, and literature. The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org) has used this XML schema for more than 5 years to accept and process submissions containing more than 1,814,139 SNPs on 20,797 subjects using 8,975 assays. Although developed in the context of pharmacogenomics, the schema is of general utility for exchange of genotype and phenotype data. We have written syntactic and semantic validators to check documents using this format. The schema and code for validation is available to the community at http://www.pharmgkb.org/schema/index.html (last accessed: 8 October 2007). (c) 2007 Wiley-Liss, Inc.
SkyMapper Southern Survey: First Data Release (DR1)
NASA Astrophysics Data System (ADS)
Wolf, Christian; Onken, Christopher A.; Luvaul, Lance C.; Schmidt, Brian P.; Bessell, Michael S.; Chang, Seo-Won; Da Costa, Gary S.; Mackey, Dougal; Martin-Jones, Tony; Murphy, Simon J.; Preston, Tim; Scalzo, Richard A.; Shao, Li; Smillie, Jon; Tisserand, Patrick; White, Marc C.; Yuan, Fang
2018-02-01
We present the first data release of the SkyMapper Southern Survey, a hemispheric survey carried out with the SkyMapper Telescope at Siding Spring Observatory in Australia. Here, we present the survey strategy, data processing, catalogue construction, and database schema. The first data release dataset includes over 66 000 images from the Shallow Survey component, covering an area of 17 200 deg2 in all six SkyMapper passbands uvgriz, while the full area covered by any passband exceeds 20 000 deg2. The catalogues contain over 285 million unique astrophysical objects, complete to roughly 18 mag in all bands. We compare our griz point-source photometry with Pan-STARRS1 first data release and note an RMS scatter of 2%. The internal reproducibility of SkyMapper photometry is on the order of 1%. Astrometric precision is better than 0.2 arcsec based on comparison with Gaia first data release. We describe the end-user database, through which data are presented to the world community, and provide some illustrative science queries.
A Low-Storage-Consumption XML Labeling Method for Efficient Structural Information Extraction
NASA Astrophysics Data System (ADS)
Liang, Wenxin; Takahashi, Akihiro; Yokota, Haruo
Recently, labeling methods to extract and reconstruct the structural information of XML data, which are important for many applications such as XPath query and keyword search, are becoming more attractive. To achieve efficient structural information extraction, in this paper we propose C-DO-VLEI code, a novel update-friendly bit-vector encoding scheme, based on register-length bit operations combining with the properties of Dewey Order numbers, which cannot be implemented in other relevant existing schemes such as ORDPATH. Meanwhile, the proposed method also achieves lower storage consumption because it does not require either prefix schema or any reserved codes for node insertion. We performed experiments to evaluate and compare the performance and storage consumption of the proposed method with those of the ORDPATH method. Experimental results show that the execution times for extracting depth information and parent node labels using the C-DO-VLEI code are about 25% and 15% less, respectively, and the average label size using the C-DO-VLEI code is about 24% smaller, comparing with ORDPATH.
Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram
2013-01-01
The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over amore » reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.« less
Scalable and responsive event processing in the cloud
Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul
2013-01-01
Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164
NASA Astrophysics Data System (ADS)
Reyes, J. C.; Vernon, F. L.; Newman, R. L.; Steidl, J. H.
2010-12-01
The Waveform Server is an interactive web-based interface to multi-station, multi-sensor and multi-channel high-density time-series data stored in Center for Seismic Studies (CSS) 3.0 schema relational databases (Newman et al., 2009). In the last twelve months, based on expanded specifications and current user feedback, both the server-side infrastructure and client-side interface have been extensively rewritten. The Python Twisted server-side code-base has been fundamentally modified to now present waveform data stored in cluster-based databases using a multi-threaded architecture, in addition to supporting the pre-existing single database model. This allows interactive web-based access to high-density (broadband @ 40Hz to strong motion @ 200Hz) waveform data that can span multiple years; the common lifetime of broadband seismic networks. The client-side interface expands on it's use of simple JSON-based AJAX queries to now incorporate a variety of User Interface (UI) improvements including standardized calendars for defining time ranges, applying on-the-fly data calibration to display SI-unit data, and increased rendering speed. This presentation will outline the various cyber infrastructure challenges we have faced while developing this application, the use-cases currently in existence, and the limitations of web-based application development.
Query-Time Optimization Techniques for Structured Queries in Information Retrieval
ERIC Educational Resources Information Center
Cartright, Marc-Allen
2013-01-01
The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…
Designing for Peta-Scale in the LSST Database
NASA Astrophysics Data System (ADS)
Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.
2007-10-01
The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.
A Framework for WWW Query Processing
NASA Technical Reports Server (NTRS)
Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)
2000-01-01
Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).
A schema theory analysis of students' think aloud protocols in an STS biology context
NASA Astrophysics Data System (ADS)
Quinlan, Catherine Louise
This dissertation study is a conglomerate of the fields of Science Education and Applied Cognitive Psychology. The goal of this study is to determine what organizational features and knowledge representation patterns high school students exhibit over time for issues pertinent to science and society. Participants are thirteen tenth grade students in a diverse suburban-urban classroom in a northeastern state. Students' think alouds are recorded, pre-, post-, and late-post treatment. Treatment consists of instruction in three Science, Technology, and Society (STS) biology issues, namely the human genome project, nutrition and health, and stem cell research. Coding and analyses are performed using Marshall's knowledge representations---identification knowledge, elaboration knowledge, planning knowledge, and execution knowledge, as well as qualitative research analysis methods. Schema theory, information processing theory, and other applied cognitive theory provide a framework in which to understand and explain students' schema descriptions and progressions over time. The results show that students display five organizational features in their identification and elaboration knowledge. Students also fall into one of four categories according to if they display prior schema or no prior schema, and their orientation "for" or "against," some of the issues. Students with prior schema and orientation "against" display the most robust schema descriptions and schema progressions. Those with no prior schemas and orientation "against" show very modest schema progressions best characterized by their keyword searches. This study shows the importance in considering not only students' integrated schemas but also their individual schemes. A role for the use of a more schema-based instruction that scaffolds student learning is implicated.
ERIC Educational Resources Information Center
Farc, Maria-Magdalena; Crouch, Julie L.; Skowronski, John J.; Milner, Joel S.
2008-01-01
Objective: Two studies examined whether accessibility of hostility-related schema influenced ratings of ambiguous child pictures. Based on the social information processing model of child physical abuse (CPA), it was expected that CPA risk status would serve as a proxy for chronic accessibility of hostile schema, while priming procedures were used…
ERIC Educational Resources Information Center
Peltier, Corey; Vannest, Kimberly J.
2018-01-01
The current study examines the effects of schema instruction on the problem-solving performance of four second-grade students with emotional and behavioral disorders. The existence of a functional relationship between the schema instruction intervention and problem-solving accuracy in mathematics is examined through a single case experiment using…
Schema Theories as a Base for the Structural Representation of the Knowledge State.
ERIC Educational Resources Information Center
Dochy, F. J. R. C.; Bouwens, M. R. J.
From the view of schema-transfer theory, the use of schemata with their several functions gives an explanation for the facilitative effect of prior knowledge on learning processes. This report gives a theoretical exploration of the concept of schemata, underlying schema theories, and functions of schemata to indicate the importance of schema…
Processing SPARQL queries with regular expressions in RDF databases
2011-01-01
Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225
Processing SPARQL queries with regular expressions in RDF databases.
Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon
2011-03-29
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
ERIC Educational Resources Information Center
Lewin, Beverly A.
Schemata based notions need not replace, but should be reflected in, product-centered reading tests. The contributions of schema theory to the psycholinguistic model of reading has been thoroughly reviewed. Schemata-based reading tests provide several advantages: (1) they engage the appropriate conceptual processes for the student which frees the…
Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis
2017-01-01
Background Bilastine is a safe and effective commonly prescribed non-sedating H1-antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Data sources and methods Queries received by the Medical Information Department and the responses provided to senders of these queries. Results The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Conclusions Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances. PMID:28210286
Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis
2017-01-01
Bilastine is a safe and effective commonly prescribed non-sedating H 1 -antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Queries received by the Medical Information Department and the responses provided to senders of these queries. The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances.
Spatial cyberinfrastructures, ontologies, and the humanities
Sieber, Renee E.; Wellen, Christopher C.; Jin, Yuan
2011-01-01
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success. PMID:21444819
Lassere, Marissa N
2008-06-01
There are clear advantages to using biomarkers and surrogate endpoints, but concerns about clinical and statistical validity and systematic methods to evaluate these aspects hinder their efficient application. Section 2 is a systematic, historical review of the biomarker-surrogate endpoint literature with special reference to the nomenclature, the systems of classification and statistical methods developed for their evaluation. In Section 3 an explicit, criterion-based, quantitative, multidimensional hierarchical levels of evidence schema - Biomarker-Surrogacy Evaluation Schema - is proposed to evaluate and co-ordinate the multiple dimensions (biological, epidemiological, statistical, clinical trial and risk-benefit evidence) of the biomarker clinical endpoint relationships. The schema systematically evaluates and ranks the surrogacy status of biomarkers and surrogate endpoints using defined levels of evidence. The schema incorporates the three independent domains: Study Design, Target Outcome and Statistical Evaluation. Each domain has items ranked from zero to five. An additional category called Penalties incorporates additional considerations of biological plausibility, risk-benefit and generalizability. The total score (0-15) determines the level of evidence, with Level 1 the strongest and Level 5 the weakest. The term ;surrogate' is restricted to markers attaining Levels 1 or 2 only. Surrogacy status of markers can then be directly compared within and across different areas of medicine to guide individual, trial-based or drug-development decisions. This schema would facilitate communication between clinical, researcher, regulatory, industry and consumer participants necessary for evaluation of the biomarker-surrogate-clinical endpoint relationship in their different settings.
ERIC Educational Resources Information Center
Hodnik Cadež, Tatjana; Manfreda Kolar, Vida
2015-01-01
A cognitive schema is a mechanism which allows an individual to organize her/his experiences in such a way that a new similar experience can easily be recognised and dealt with successfully. Well-structured schemas provide for the knowledge base for subsequent mathematical activities. A new experience can be assimilated into a previously existing…
The cognitive nexus between Bohr's analogy for the atom and Pauli's exclusion schema.
Ulazia, Alain
2016-03-01
The correspondence principle is the primary tool Bohr used to guide his contributions to quantum theory. By examining the cognitive features of the correspondence principle and comparing it with those of Pauli's exclusion principle, I will show that it did more than simply 'save the phenomena'. The correspondence principle in fact rested on powerful analogies and mental schemas. Pauli's rejection of model-based methods in favor of a phenomenological, rule-based approach was therefore not as disruptive as some historians have indicated. Even at a stage that seems purely phenomenological, historical studies of theoretical development should take into account non-formal, model-based approaches in the form of mental schemas, analogies and images. In fact, Bohr's images and analogies had non-classical components which were able to evoke the idea of exclusion as a prohibition law and as a preliminary mental schema. Copyright © 2016 Elsevier Ltd. All rights reserved.
Full glowworm swarm optimization algorithm for whole-set orders scheduling in single machine.
Yu, Zhang; Yang, Xiaomei
2013-01-01
By analyzing the characteristics of whole-set orders problem and combining the theory of glowworm swarm optimization, a new glowworm swarm optimization algorithm for scheduling is proposed. A new hybrid-encoding schema combining with two-dimensional encoding and random-key encoding is given. In order to enhance the capability of optimal searching and speed up the convergence rate, the dynamical changed step strategy is integrated into this algorithm. Furthermore, experimental results prove its feasibility and efficiency.
ERIC Educational Resources Information Center
Bierschenk, Bernhard
In this study, the Kantian schema has been applied to natural language expression. The novelty of the approach concerns the way in which the Kantian schema interrelates the analytic with the synthetic mode in the construction of the presented formalism. The main thesis is based on the premise that the synthetic, in contrast to the analytic,…
Corbacho, Fernando; Nishikawa, Kiisa C; Weerasuriya, Ananda; Liaw, Jim-Shih; Arbib, Michael A
2005-12-01
The previous companion paper describes the initial (seed) schema architecture that gives rise to the observed prey-catching behavior. In this second paper in the series we describe the fundamental adaptive processes required during learning after lesioning. Following bilateral transections of the hypoglossal nerve, anurans lunge toward mealworms with no accompanying tongue or jaw movement. Nevertheless anurans with permanent hypoglossal transections eventually learn to catch their prey by first learning to open their mouth again and then lunging their body further and increasing their head angle. In this paper we present a new learning framework, called schema-based learning (SBL). SBL emphasizes the importance of the current existent structure (schemas), that defines a functioning system, for the incremental and autonomous construction of ever more complex structure to achieve ever more complex levels of functioning. We may rephrase this statement into the language of Schema Theory (Arbib 1992, for a comprehensive review) as the learning of new schemas based on the stock of current schemas. SBL emphasizes a fundamental principle of organization called coherence maximization, that deals with the maximization of congruence between the results of an interaction (external or internal) and the expectations generated for that interaction. A central hypothesis consists of the existence of a hierarchy of predictive internal models (predictive schemas) all over the control center-brain-of the agent. Hence, we will include predictive models in the perceptual, sensorimotor, and motor components of the autonomous agent architecture. We will then show that predictive models are fundamental for structural learning. In particular we will show how a system can learn a new structural component (augment the overall network topology) after being lesioned in order to recover (or even improve) its original functionality. Learning after lesioning is a special case of structural learning but clearly shows that solutions cannot be known/hardwired a priori since it cannot be known, in advance, which substructure is going to break down.
The Hematopoietic Expression Viewer: expanding mobile apps as a scientific tool.
James, Regis A; Rao, Mitchell M; Chen, Edward S; Goodell, Margaret A; Shaw, Chad A
2012-07-15
Many important data in current biological science comprise hundreds, thousands or more individual results. These massive data require computational tools to navigate results and effectively interact with the content. Mobile device apps are an increasingly important tool in the everyday lives of scientists and non-scientists alike. These software present individuals with compact and efficient tools to interact with complex data at meetings or other locations remote from their main computing environment. We believe that apps will be important tools for biologists, geneticists and physicians to review content while participating in biomedical research or practicing medicine. We have developed a prototype app for displaying gene expression data using the iOS platform. To present the software engineering requirements, we review the model-view-controller schema for Apple's iOS. We apply this schema to a simple app for querying locally developed microarray gene expression data. The challenge of this application is to balance between storing content locally within the app versus obtaining it dynamically via a network connection. The Hematopoietic Expression Viewer is available at http://www.shawlab.org/he_viewer. The source code for this project and any future information on how to obtain the app can be accessed at http://www.shawlab.org/he_viewer.
PharmARTS: terminology web services for drug safety data coding and retrieval.
Alecu, Iulian; Bousquet, Cédric; Degoulet, Patrice; Jaulent, Marie-Christine
2007-01-01
MedDRA and WHO-ART are the terminologies used to encode drug safety reports. The standardisation achieved with these terminologies facilitates: 1) The sharing of safety databases; 2) Data mining for the continuous reassessment of benefit-risk ratio at national or international level or in the pharmaceutical industry. There is some debate about the capacity of these terminologies for retrieving case reports related to similar medical conditions. We have developed a resource that allows grouping similar medical conditions more effectively than WHO-ART and MedDRA. We describe here a software tool facilitating the use of this terminological resource thanks to an RDF framework with support for RDF Schema inferencing and querying. This tool eases coding and data retrieval in drug safety.
Data Sharing in DHT Based P2P Systems
NASA Astrophysics Data System (ADS)
Roncancio, Claudia; Del Pilar Villamil, María; Labbé, Cyril; Serrano-Alvarado, Patricia
The evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the “extreme” characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases for providing data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identifies important future research trends in data management in P2P DHT systems.
Atmaca, Sinem; Gençöz, Tülin
2016-02-01
The purpose of the current study is to explore the revictimization process between child abuse and neglect (CAN), and intimate partner violence (IPV) based on the schema theory perspective. For this aim, 222 married women recruited in four central cities of Turkey participated in the study. Results indicated that early negative CAN experiences increased the risk of being exposed to later IPV. Specifically, emotional abuse and sexual abuse in the childhood predicted the four subtypes of IPV, which are physical, psychological, and sexual violence, and injury, while physical abuse only associated with physical violence. To explore the mediational role of early maladaptive schemas (EMSs) on this association, first, five schema domains were tested via Parallel Multiple Mediation Model. Results indicated that only Disconnection/Rejection (D/R) schema domains mediated the association between CAN and IPV. Second, to determine the particular mediational roles of each schema, eighteen EMS were tested as mediators, and results showed that Emotional Deprivation Schema and Vulnerability to Harm or Illness Schema mediated the association between CAN and IPV. These findings provided an empirical support for the crucial roles of EMSs on the effect of revictimization process. Clinical implications were discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Database technology and the management of multimedia data in the Mirror project
NASA Astrophysics Data System (ADS)
de Vries, Arjen P.; Blanken, H. M.
1998-10-01
Multimedia digital libraries require an open distributed architecture instead of a monolithic database system. In the Mirror project, we use the Monet extensible database kernel to manage different representation of multimedia objects. To maintain independence between content, meta-data, and the creation of meta-data, we allow distribution of data and operations using CORBA. This open architecture introduces new problems for data access. From an end user's perspective, the problem is how to search the available representations to fulfill an actual information need; the conceptual gap between human perceptual processes and the meta-data is too large. From a system's perspective, several representations of the data may semantically overlap or be irrelevant. We address these problems with an iterative query process and active user participating through relevance feedback. A retrieval model based on inference networks assists the user with query formulation. The integration of this model into the database design has two advantages. First, the user can query both the logical and the content structure of multimedia objects. Second, the use of different data models in the logical and the physical database design provides data independence and allows algebraic query optimization. We illustrate query processing with a music retrieval application.
ProtaBank: A repository for protein design and engineering data.
Wang, Connie Y; Chang, Paul M; Ary, Marie L; Allen, Benjamin D; Chica, Roberto A; Mayo, Stephen L; Olafson, Barry D
2018-03-25
We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user-friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theodore Larrieu, Christopher Slominski, Michele Joyce
2011-03-01
With the inauguration of the CEBAF Element Database (CED) in Fall 2010, Jefferson Lab computer scientists have taken a step toward the eventual goal of a model-driven accelerator. Once fully populated, the database will be the primary repository of information used for everything from generating lattice decks to booting control computers to building controls screens. A requirement influencing the CED design is that it provide access to not only present, but also future and past configurations of the accelerator. To accomplish this, an introspective database schema was designed that allows new elements, types, and properties to be defined on-the-fly withmore » no changes to table structure. Used in conjunction with Oracle Workspace Manager, it allows users to query data from any time in the database history with the same tools used to query the present configuration. Users can also check-out workspaces to use as staging areas for upcoming machine configurations. All Access to the CED is through a well-documented Application Programming Interface (API) that is translated automatically from original C++ source code into native libraries for scripting languages such as perl, php, and TCL making access to the CED easy and ubiquitous.« less
Hoelzer, S; Schweiger, R K; Boettcher, H A; Tafazzoli, A G; Dudeck, J
2001-01-01
The purpose of guidelines in clinical practice is to improve the effectiveness and efficiency of clinical care. It is known that nationally or internationally produced guidelines which, in particular, do not involve medical processes at the time of consultation, do not take local factors into account, and have no consistent implementation strategy, have limited impact in changing either the behaviour of physicians, or patterns of care. The literature provides evidence for the effectiveness of computerization of CPGs for increasing compliance and improving patient outcomes. Probably the most effective concepts are knowledge-based functions for decision support or monitoring that are integrated in clinical information systems. This approach is mostly restricted by the effort required for development and maintenance of the information systems and the limited number of implemented medical rules. Most of the guidelines are text-based, and are primarily published in medical journals and posted on the internet. However, internet-published guidelines have little impact on the behaviour of physicians. It can be difficult and time-consuming to browse the internet to find (a) the correct guidelines to an existing diagnosis and (b) and adequate recommendation for a specific clinical problem. Our objective is to provide a web-based guideline service that takes as input clinical data on a particular patient and returns as output a customizable set of recommendations regarding diagnosis and treatment. Information in healthcare is to a very large extent transmitted and stored as unstructured or slightly structured text such as discharge letters, reports, forms, etc. The same applies for facilities containing medical information resources for clinical purposes and research such as text books, articles, guidelines, etc. Physicians are used to obtaining information from text-based sources. Since most guidelines are text-based, it would be practical to use a document-based solution that preserves the original cohesiveness. The lack of structure limits the automatic identification and extraction of the information contained in these resources. For this reason, we have chosen a document-based approach using eXtensible Markup Language (XML) with its schema definition and related technologies. XML empowers the applications for in-context searching. In addition it allows the same content to be represented in different ways. Our XML reference clinical data model for guidelines has been realized with the XML schema definition. The schema is used for structuring new text-based guidelines and updating existing documents. It is also used to establish search strategies on the document base. We hypothesize that enabling the physicians to query the available CPGs easily, and to get access to selected and specific information at the point of care will foster increased use. Based on current evidence we are confident that it will have substantial impact on the care provided, and will improve health outcomes.
Investigating a Tier 1 Intervention Focused on Proportional Reasoning: A Follow-Up Study
ERIC Educational Resources Information Center
Jitendra, Asha K.; Harwell, Michael R.; Karl, Stacy R.; Simonson, Gregory R.; Slater, Susan C.
2017-01-01
This randomized controlled study investigated the efficacy of a Tier 1 intervention--schema-based instruction--designed to help students with and without mathematics difficulties (MD) develop proportional reasoning. Twenty seventh-grade teachers/classrooms were randomly assigned to a treatment condition (schema-based instruction) or control…
How Do DSM-5 Personality Traits Align With Schema Therapy Constructs?
Bach, Bo; Lee, Christopher; Mortensen, Erik Lykke; Simonsen, Erik
2016-08-01
DSM-5 offers an alternative model of personality pathology that includes 25 traits. Although personality disorders are mostly treated with psychotherapy, the correspondence between DSM-5 traits and concepts in evidence-based psychotherapy has not yet been evaluated adequately. Suitably, schema therapy was developed for treating personality disorders, and it has achieved promising evidence. The authors examined associations between DSM-5 traits and schema therapy constructs in a mixed sample of 662 adults, including 312 clinical participants. Associations were investigated in terms of factor loadings and regression coefficients in relation to five domains, followed by specific correlations among all constructs. The results indicated conceptually coherent associations, and 15 of 25 traits were strongly related to relevant schema therapy constructs. Conclusively, DSM-5 traits may be considered expressions of schema therapy constructs, which psychotherapists might take advantage of in terms of case formulation and targets of treatment. In turn, schema therapy constructs add theoretical understanding to DSM-5 traits.
Topological Schemas of Cognitive Maps and Spatial Learning.
Babichev, Andrey; Cheng, Sen; Dabaghian, Yuri A
2016-01-01
Spatial navigation in mammals is based on building a mental representation of their environment-a cognitive map. However, both the nature of this cognitive map and its underpinning in neural structures and activity remains vague. A key difficulty is that these maps are collective, emergent phenomena that cannot be reduced to a simple combination of inputs provided by individual neurons. In this paper we suggest computational frameworks for integrating the spiking signals of individual cells into a spatial map, which we call schemas. We provide examples of four schemas defined by different types of topological relations that may be neurophysiologically encoded in the brain and demonstrate that each schema provides its own large-scale characteristics of the environment-the schema integrals. Moreover, we find that, in all cases, these integrals are learned at a rate which is faster than the rate of complete training of neural networks. Thus, the proposed schema framework differentiates between the cognitive aspect of spatial learning and the physiological aspect at the neural network level.
Development and evaluation of a biomedical search engine using a predicate-based vector space model.
Kwak, Myungjae; Leroy, Gondy; Martinez, Jesse D; Harwell, Jeffrey
2013-10-01
Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search. Copyright © 2013 Elsevier Inc. All rights reserved.
Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks
Ahmed, Khandakar; Gregory, Mark A.
2015-01-01
The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries. PMID:25751081
Group Schema Therapy for Eating Disorders: A Pilot Study
Simpson, Susan G.; Morrow, Emma; van Vreeswijk, Michiel; Reid, Caroline
2010-01-01
This paper describes the use of Group Schema Therapy for Eating Disorders (ST-E-g) in a case series of eight participants with chronic eating disorders and high levels of co-morbidity. Treatment was comprised of 20 sessions which included cognitive, experiential, and interpersonal strategies, with an emphasis on behavioral change. Specific schema-based strategies focused on bodily felt-sense and body-image, as well as emotional regulation skills. Six attended until end of treatment, two dropped-out at mid-treatment. Eating disorder severity, global schema severity, shame, and anxiety levels were reduced between pre- and post-therapy, with a large effect size at follow-up. Clinically significant improvement in eating severity was found in four out of six completers. Group completers showed a mean reduction in schema severity of 43% at post-treatment, and 59% at follow-up. By follow-up, all completers had achieved over 60% improvement in schema severity. Self-report feedback suggests that group factors may catalyze the change process in schema therapy by increasing perceptions of support and encouragement to take risks and try out new behaviors, whilst providing a de-stigmatizing and de-shaming therapeutic experience. PMID:21833243
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
Query Optimization by Semantic Reasoning.
1981-05-01
condition holds, then formulas X and Y are said to be ,nerge-compatible. Let xi be the variable in X that corresponds to variable yj in Y (x is not...Davidson, Ramez EI-Masri, Sheldon Finkelstein, Hector Garcia, Mohammed Olumi, Tom Rogers, Neil Rowe, David Shaw, and Kyu-Young Whang . Special credit...for the simple queries, along with cost formulas and applicability conditions for the methods. Most recently has come the development of optimizers for
ERIC Educational Resources Information Center
Mitsugi, Makoto
2017-01-01
The purpose of this study is to investigate the effectiveness of two instruction methods for teaching polysemous English prepositions ("at, in, on") and to explore learners' perception on learning tools used in the instruction when learning polysemous words. The first study investigated the effectiveness of schema-based instruction…
ERIC Educational Resources Information Center
Jitendra, Asha K.; Star, Jon R.; Starosta, Kristin; Leh, Jayne M.; Sood, Sheetal; Caskie, Grace; Hughes, Cheyenne L.; Mack, Toshi R.
2009-01-01
The present study evaluated the effectiveness of an instructional intervention (schema-based instruction, SBI) that was designed to meet the diverse needs of middle school students by addressing the research literatures from both special education and mathematics education. Specifically, SBI emphasizes the role of the mathematical structure of…
PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.
Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J
2017-09-20
There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.
Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval
Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene
2018-01-01
Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379
Schaap, Grietje M; Chakhssi, Farid; Westerhof, Gerben J
2016-12-01
This study provides an evaluation of group schema therapy (ST) for inpatient treatment of patients with personality pathology who did not respond to previous psychotherapeutic interventions. Forty-two patients were assessed pre- and posttreatment, and 35 patients were evaluated at follow-up 6 months later. The results showed a dropout rate of 35%. Those who dropped out did not differ from those who completed treatment with regard to demographic and clinical variables; the only exception was that those who dropped out showed a lower prevalence of mood disorders. Furthermore, intention-to-treat analyses showed a significant improvement in maladaptive schemas, schema modes, maladaptive coping styles, mental well-being, and psychological distress after treatment, and these improvements were maintained at follow-up. On the other hand, there was no significant change in experienced parenting style as self-reported by patients. Changes in schemas and schema modes measured from pre- to posttreatment were predictive of general psychological distress at follow-up. Overall, these preliminary findings suggest that positive treatment results can be obtained with group ST-based inpatient treatment for patients who did not respond to previous psychotherapeutic interventions. Moreover, these findings are comparable with treatment results for patients without such a nonresponsive treatment history. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Ontology-Driven Provenance Management in eScience: An Application in Parasite Research
NASA Astrophysics Data System (ADS)
Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.
Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.
Sharing Epigraphic Information as Linked Data
NASA Astrophysics Data System (ADS)
Álvarez, Fernando-Luis; García-Barriocanal, Elena; Gómez-Pantoja, Joaquín-L.
The diffusion of epigraphic data has evolved in the last years from printed catalogues to indexed digital databases shared through the Web. Recently, the open EpiDoc specifications have resulted in an XML-based schema for the interchange of ancient texts that uses XSLT to render typographic representations. However, these schemas and representation systems are still not providing a way to encode computational semantics and semantic relations between pieces of epigraphic data. This paper sketches an approach to bring these semantics into an EpiDoc based schema using the Ontology Web Language (OWL) and following the principles and methods of information sharing known as "linked data". The paper describes the general principles of the OWL mapping of the EpiDoc schema and how epigraphic data can be shared in RDF format via dereferenceable URIs that can be used to build advanced search, visualization and analysis systems.
Evaluating non-relational storage technology for HEP metadata and meta-data catalog
NASA Astrophysics Data System (ADS)
Grigorieva, M. A.; Golosova, M. V.; Gubin, M. Y.; Klimentov, A. A.; Osipova, V. V.; Ryabinkin, E. A.
2016-10-01
Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
Yang, Guo-Liang; Lim, C C Tchoyoson
2006-08-01
Radiology education is heavily dependent on visual images, and case-based teaching files comprising medical images can be an important tool for teaching diagnostic radiology. Currently, hardcopy film is being rapidly replaced by digital radiological images in teaching hospitals, and an electronic teaching file (ETF) library would be desirable. Furthermore, a repository of ETFs deployed on the World Wide Web has the potential for e-learning applications to benefit a larger community of learners. In this paper, we describe a Singapore National Medical Image Resource Centre (SN.MIRC) that can serve as a World Wide Web resource for teaching diagnostic radiology. On SN.MIRC, ETFs can be created using a variety of mechanisms including file upload and online form-filling, and users can search for cases using the Medical Image Resource Center (MIRC) query schema developed by the Radiological Society of North America (RSNA). The system can be improved with future enhancements, including multimedia interactive teaching files and distance learning for continuing professional development. However, significant challenges exist when exploring the potential of using the World Wide Web for radiology education.
NASA Astrophysics Data System (ADS)
Ulbricht, Damian; Elger, Kirsten; Bertelmann, Roland; Klump, Jens
2016-04-01
With the foundation of DataCite in 2009 and the technical infrastructure installed in the last six years it has become very easy to create citable dataset DOIs. Nowadays, dataset DOIs are increasingly accepted and required by journals in reference lists of manuscripts. In addition, DataCite provides usage statistics [1] of assigned DOIs and offers a public search API to make research data count. By linking related information to the data, they become more useful for future generations of scientists. For this purpose, several identifier systems, as ISBN for books, ISSN for journals, DOI for articles or related data, Orcid for authors, and IGSN for physical samples can be attached to DOIs using the DataCite metadata schema [2]. While these are good preconditions to publish data, free and open solutions that help with the curation of data, the publication of research data, and the assignment of DOIs in one software seem to be rare. At GFZ Potsdam we built a modular software stack that is made of several free and open software solutions and we established 'GFZ Data Services'. 'GFZ Data Services' provides storage, a metadata editor for publication and a facility to moderate minted DOIs. All software solutions are connected through web APIs, which makes it possible to reuse and integrate established software. Core component of 'GFZ Data Services' is an eSciDoc [3] middleware that is used as central storage, and has been designed along the OAIS reference model for digital preservation. Thus, data are stored in self-contained packages that are made of binary file-based data and XML-based metadata. The eSciDoc infrastructure provides access control to data and it is able to handle half-open datasets, which is useful in embargo situations when a subset of the research data are released after an adequate period. The data exchange platform panMetaDocs [4] makes use of eSciDoc's REST API to upload file-based data into eSciDoc and uses a metadata editor [5] to annotate the files with metadata. The metadata editor has a user-friendly interface with nominal lists, extensive explanations, and an interactive mapping tool to provide assistance to scientists describing the data. It is possible to deposit metadata templates to fill certain fields with default values. The metadata editor generates metadata in the schemas ISO19139, NASA GCMD DIF, and DataCite and could be extended for other schemas. panMetaDocs is able to mint dataset DOIs through DOIDB, which is our component to moderate dataset DOIs issued through 'GFZ Data Services'. DOIDB accepts metadata in the schemas ISO19139, DIF, and DataCite. In addition, DOIDB provides an OAI-PMH interface to disseminate all deposited metadata to data portals. The presentation of datasets on DOI landing pages is done though XSLT stylesheet transformation of the XML-based metadata. The landing pages have been designed to meet needs of scientists. We are able to render the metadata to different layouts. Furthermore, additional information about datasets and publications is assembled into the webpage by querying public databases on the internet. The work presented here will focus on technical details of the software stack. [1] http://stats.datacite.org [2] http://www.dlib.org/dlib/january11/starr/01starr.html [3] http://www.escidoc.org [4] http://panmetadocs.sf.net [5] http://github.com/ulbricht
A rank-based Prediction Algorithm of Learning User's Intention
NASA Astrophysics Data System (ADS)
Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing
Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
The development of a classification schema for arts-based approaches to knowledge translation.
Archibald, Mandy M; Caine, Vera; Scott, Shannon D
2014-10-01
Arts-based approaches to knowledge translation are emerging as powerful interprofessional strategies with potential to facilitate evidence uptake, communication, knowledge, attitude, and behavior change across healthcare provider and consumer groups. These strategies are in the early stages of development. To date, no classification system for arts-based knowledge translation exists, which limits development and understandings of effectiveness in evidence syntheses. We developed a classification schema of arts-based knowledge translation strategies based on two mechanisms by which these approaches function: (a) the degree of precision in key message delivery, and (b) the degree of end-user participation. We demonstrate how this classification is necessary to explore how context, time, and location shape arts-based knowledge translation strategies. Classifying arts-based knowledge translation strategies according to their core attributes extends understandings of the appropriateness of these approaches for various healthcare settings and provider groups. The classification schema developed may enhance understanding of how, where, and for whom arts-based knowledge translation approaches are effective, and enable theorizing of essential knowledge translation constructs, such as the influence of context, time, and location on utilization strategies. The classification schema developed may encourage systematic inquiry into the effectiveness of these approaches in diverse interprofessional contexts. © 2014 Sigma Theta Tau International.
Suzuki, Satoshi
2017-09-01
This study investigated the spatial distribution of brain activity on body schema (BS) modification induced by natural body motion using two versions of a hand-tracing task. In Task 1, participants traced Japanese Hiragana characters using the right forefinger, requiring no BS expansion. In Task 2, participants performed the tracing task with a long stick, requiring BS expansion. Spatial distribution was analyzed using general linear model (GLM)-based statistical parametric mapping of near-infrared spectroscopy data contaminated with motion artifacts caused by the hand-tracing task. Three methods were utilized in series to counter the artifacts, and optimal conditions and modifications were investigated: a model-free method (Step 1), a convolution matrix method (Step 2), and a boxcar-function-based Gaussian convolution method (Step 3). The results revealed four methodological findings: (1) Deoxyhemoglobin was suitable for the GLM because both Akaike information criterion and the variance against the averaged hemodynamic response function were smaller than for other signals, (2) a high-pass filter with a cutoff frequency of .014 Hz was effective, (3) the hemodynamic response function computed from a Gaussian kernel function and its first- and second-derivative terms should be included in the GLM model, and (4) correction of non-autocorrelation and use of effective degrees of freedom were critical. Investigating z-maps computed according to these guidelines revealed that contiguous areas of BA7-BA40-BA21 in the right hemisphere became significantly activated ([Formula: see text], [Formula: see text], and [Formula: see text], respectively) during BS modification while performing the hand-tracing task.
XCEDE: An Extensible Schema For Biomedical Data
Gadde, Syam; Aucoin, Nicole; Grethe, Jeffrey S.; Keator, David B.; Marcus, Daniel S.; Pieper, Steve
2013-01-01
The XCEDE (XML-based Clinical and Experimental Data Exchange) XML schema, developed by members of the BIRN (Biomedical Informatics Research Network), provides an extensive metadata hierarchy for storing, describing and documenting the data generated by scientific studies. Currently at version 2.0, the XCEDE schema serves as a specification for the exchange of scientific data between databases, analysis tools, and web services. It provides a structured metadata hierarchy, storing information relevant to various aspects of an experiment (project, subject, protocol, etc.). Each hierarchy level also provides for the storage of data provenance information allowing for a traceable record of processing and/or changes to the underlying data. The schema is extensible to support the needs of various data modalities and to express types of data not originally envisioned by the developers. The latest version of the XCEDE schema and manual are available from http://www.xcede.org/ PMID:21479735
DOE Office of Scientific and Technical Information (OSTI.GOV)
Starke, M.; Herron, A.; King, D.
Communications systems and protocols are becoming second nature to utilities operating distribution systems. Traditionally, centralized communication approaches are often used, while recently in microgrid applications, distributed communication and control schema emerge offering several advantages such as improved system reliability, plug-and-play operation and distributed intelligence. Still, operation and control of microgrids including distributed communication schema have been less of a discussion in the literature. To address the challenge of multiple-inverter microgrid synchronization, a publish-subscribe protocol based, Data Distribution Service (DDS), communication schema for microgrids is proposed in this paper. The communication schema is discussed in details for individual devices such asmore » generators, photovoltaic systems, energy storage systems, microgrid point of common coupling switch, and supporting applications. In conclusion, islanding and resynchronization of a microgrid are demonstrated on a test-bed utilizing this schema.« less
Starke, M.; Herron, A.; King, D.; ...
2017-08-24
Communications systems and protocols are becoming second nature to utilities operating distribution systems. Traditionally, centralized communication approaches are often used, while recently in microgrid applications, distributed communication and control schema emerge offering several advantages such as improved system reliability, plug-and-play operation and distributed intelligence. Still, operation and control of microgrids including distributed communication schema have been less of a discussion in the literature. To address the challenge of multiple-inverter microgrid synchronization, a publish-subscribe protocol based, Data Distribution Service (DDS), communication schema for microgrids is proposed in this paper. The communication schema is discussed in details for individual devices such asmore » generators, photovoltaic systems, energy storage systems, microgrid point of common coupling switch, and supporting applications. In conclusion, islanding and resynchronization of a microgrid are demonstrated on a test-bed utilizing this schema.« less
Improving Grasp Skills Using Schema Structured Learning
NASA Technical Reports Server (NTRS)
Platt, Robert; Grupen, ROderic A.; Fagg, Andrew H.
2006-01-01
Abstract In the control-based approach to robotics, complex behavior is created by sequencing and combining control primitives. While it is desirable for the robot to autonomously learn the correct control sequence, searching through the large number of potential solutions can be time consuming. This paper constrains this search to variations of a generalized solution encoded in a framework known as an action schema. A new algorithm, SCHEMA STRUCTURED LEARNING, is proposed that repeatedly executes variations of the generalized solution in search of instantiations that satisfy action schema objectives. This approach is tested in a grasping task where Dexter, the UMass humanoid robot, learns which reaching and grasping controllers maximize the probability of grasp success.
Do schema processes mediate links between parenting and eating pathology?
Sheffield, Alex; Waller, Glenn; Emanuelli, Francesca; Murray, James; Meyer, Caroline
2009-07-01
Adverse parenting experiences are commonly linked to eating pathology. A schema-based model of the development and maintenance of eating pathology proposes that one of the potential mediators of the link between parenting and eating pathology might be the development of schema maintenance processes--mechanisms that operate to help the individual avoid intolerable emotions. To test this hypothesis, 353 female students and 124 female eating-disordered clients were recruited. They completed a measure of perceived parenting experiences as related to schema development (Young Parenting Inventory-Revised (YPI-R)), two measures of schema processes (Young Compensatory Inventory; Young-Rygh Avoidance Inventory (YRAI)) and a measure of eating pathology (Eating Disorders Inventory (EDI)). In support of the hypothesis, certain schema processes did mediate the relationship between specific perceptions of parenting and particular forms of eating pathology, although these were different for the clinical and non-clinical samples. In those patients where parenting is implicated in the development of eating pathology, treatment might need to target the cognitive processes that can explain this link. 2009 John Wiley & Sons, Ltd and Eating Disorders Association
Technology, design and dementia: an exploratory survey of developers.
Jiancaro, Tizneem; Jaglal, Susan B; Mihailidis, Alex
2017-08-01
Despite worldwide surges in dementia, we still know relatively little about the design of home technologies that support this population. The purpose of this study was to investigate design considerations from the perspective of developers. Participants, including technical and clinical specialists, were recruited internationally and answered web-based survey questions comprising Likert-type responses with text entry options. Developers were queried on 23 technology acceptance characteristics and 24 design practices. In all, forty developers completed the survey. Concerning "technology acceptance", cost, learnability, self-confidence (during use) and usability were deemed very important. Concerning "design practice", developers overwhelmingly valued user-centred design (UCD). In terms of general assistive technology (AT) models, these were largely unknown by technical specialists compared to clinical specialists. Recommendations based on this study include incorporating "self-confidence" into design protocols; examining the implications of "usability" and UCD in this context; and considering empathy-based design approaches to suit a diverse user population. Moreover, clinical specialists have much to offer development teams, particularly concerning the use of conceptual AT models. Implications of rehabilitation Stipulate precise usability criteria. Consider "learnability" and "self-confidence" as technology adoption criteria. Recognize the important theoretical role that clinical specialists can fulfil concerning the use of design schemas. Acknowledge the diversity amongst users with dementia, potentially adopting techniques, such as designing for "extraordinary users".
A Scalable Monitoring for the CMS Filter Farm Based on Elasticsearch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andre, J.M.; et al.
2015-12-23
A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured informationmore » can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central” es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2.« less
Masseroli, Marco; Marchente, Mario
2008-07-01
We present X-PAT, a platform-independent software prototype that is able to manage patient referral multimedia data in an intranet network scenario according to the specific control procedures of a healthcare institution. It is a self-developed storage framework based on a file system, implemented in eXtensible Markup Language (XML) and PHP Hypertext Preprocessor Language, and addressed to the requirements of limited-dimension healthcare entities (small hospitals, private medical centers, outpatient clinics, and laboratories). In X-PAT, healthcare data descriptions, stored in a novel Referral Base Management System (RBMS) according to Health Level 7 Clinical Document Architecture Release 2 (CDA R2) standard, can be easily applied to the specific data and organizational procedures of a particular healthcare working environment thanks also to the use of standard clinical terminology. Managed data, centralized on a server, are structured in the RBMS schema using a flexible patient record and CDA healthcare referral document structures based on XML technology. A novel search engine allows defining and performing queries on stored data, whose rapid execution is ensured by expandable RBMS indexing structures. Healthcare personnel can interface the X-PAT system, according to applied state-of-the-art privacy and security measures, through friendly and intuitive Web pages that facilitate user acceptance.
NASA Technical Reports Server (NTRS)
Lynnes, Chris
2014-01-01
Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.
BIOSPIDA: A Relational Database Translator for NCBI.
Hagen, Matthew S; Lee, Eva K
2010-11-13
As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.
ERIC Educational Resources Information Center
Jitendra, Asha K.; Star, Jon R.; Dupuis, Danielle N.; Rodriguez, Michael C.
2013-01-01
This study examined the effect of schema-based instruction (SBI) on 7th-grade students' mathematical problem-solving performance. SBI is an instructional intervention that emphasizes the role of mathematical structure in word problems and also provides students with a heuristic to self-monitor and aid problem solving. Using a…
ERIC Educational Resources Information Center
Root, Jenny R.; Browder, Diane M.; Saunders, Alicia F.; Lo, Ya-yu
2017-01-01
The current study evaluated the effects of modified schema-based instruction on the mathematical word problem solving skills of three elementary students with autism spectrum disorders and moderate intellectual disability. Participants learned to solve compare problem type with themes that related to their interests and daily experiences. In…
Case-Based Instruction and Learning: An Interdisciplinary Project.
ERIC Educational Resources Information Center
Alvarez, Marino C.; And Others
Case-based learning is one method that can be used to foster critical thinking and schema construction. Students need to be provided with problem solving lessons in meaningful learning contexts for critical thinking to take place. In order for schema construction to occur, a framework needs to be provided that helps readers to elaborate upon new…
Retrofitting the AutoBayes Program Synthesis System with Concrete Syntax
NASA Technical Reports Server (NTRS)
Fischer, Bernd; Visser, Eelco
2004-01-01
AutoBayes is a fully automatic, schema-based program synthesis system for statistical data analysis applications. Its core component is a schema library. i.e., a collection of generic code templates with associated applicability constraints which are instantiated in a problem-specific way during synthesis. Currently, AutoBayes is implemented in Prolog; the schemas thus use abstract syntax (i.e., Prolog terms) to formulate the templates. However, the conceptual distance between this abstract representation and the concrete syntax of the generated programs makes the schemas hard to create and maintain. In this paper we describe how AutoBayes is retrofitted with concrete syntax. We show how it is integrated into Prolog and describe how the seamless interaction of concrete syntax fragments with AutoBayes's remaining legacy meta-programming kernel based on abstract syntax is achieved. We apply the approach to gradually mitigate individual schemas without forcing a disruptive migration of the entire system to a different First experiences show that a smooth migration can be achieved. Moreover, it can result in a considerable reduction of the code size and improved readability of the code. In particular, abstracting out fresh-variable generation and second-order term construction allows the formulation of larger continuous fragments.
Peixoto, Maria Manuela; Nobre, Pedro
2017-04-01
Despite the existence of conceptual models of sexual dysfunction based on cognitive theory, few studies have tested the role of vulnerability factors such as sexual beliefs as moderators of the activation of cognitive schemas in response to negative sexual events. To test the moderator role of dysfunctional sexual beliefs in the association between the frequency of negative sexual episodes and the activation of incompetence schemas in gay and heterosexual men. Five-hundred seventy-five men (287 gay, 288 heterosexual) who completed an online survey on cognitive-affective dimensions and sexual functioning were selected from a larger database. Hierarchical regression analyses were conducted to test the hypothesis that dysfunctional sexual beliefs moderate the association between the frequency of unsuccessful sexual episodes and the activation of incompetence schemas. Participants completed the Sexual Dysfunctional Beliefs Questionnaire and the Questionnaire of Cognitive Schemas Activated in Sexual Context. Findings indicated that men's ability for always being ready for sex, to satisfy the partner, and to maintain an erection until ending sexual activity constitute "macho" beliefs that moderate the activation of incompetence schemas when unsuccessful sexual events occur in gay and heterosexual men. In addition, activation of incompetence schemas in response to negative sexual events in gay men was moderated by the endorsement of conservative attitudes toward moderate sexuality. The main findings suggested that psychological interventions targeting dysfunctional sexual beliefs could help de-catastrophize the consequences of negative sexual events and facilitate sexual functioning. Despite being a web-based study, it represents the first attempt to test the moderator role of dysfunctional sexual beliefs in the association between the frequency of unsuccessful sexual episodes and the activation of incompetence schemas in gay and heterosexual men. Overall, findings support the role of sexual beliefs as facilitators of the activation of incompetence schemas in the face of negative sexual events in gay and heterosexual men, emphasizing the need to develop treatment and prevention strategies aimed at challenging common male beliefs about sexuality. Peixoto MM, Nobre P. "Macho" Beliefs Moderate the Association Between Negative Sexual Episodes and Activation of Incompetence Schemas in Sexual Context, in Gay and Heterosexual Men. J Sex Med 2017;14:518-525. Copyright © 2017 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Schema-driven facilitation of new hierarchy learning in the transitive inference paradigm
Kumaran, Dharshan
2013-01-01
Prior knowledge, in the form of a mental schema or framework, is viewed to facilitate the learning of new information in a range of experimental and everyday scenarios. Despite rising interest in the cognitive and neural mechanisms underlying schema-driven facilitation of new learning, few paradigms have been developed to examine this issue in humans. Here we develop a multiphase experimental scenario aimed at characterizing schema-based effects in the context of a paradigm that has been very widely used across species, the transitive inference task. We show that an associative schema, comprised of prior knowledge of the rank positions of familiar items in the hierarchy, has a marked effect on transitivity performance and the development of relational knowledge of the hierarchy that cannot be accounted for by more general changes in task strategy. Further, we show that participants are capable of deploying prior knowledge to successful effect under surprising conditions (i.e., when corrective feedback is totally absent), but only when the associative schema is robust. Finally, our results provide insights into the cognitive mechanisms underlying such schema-driven effects, and suggest that new hierarchy learning in the transitive inference task can occur through a contextual transfer mechanism that exploits the structure of associative experiences. PMID:23782509
Schema-driven facilitation of new hierarchy learning in the transitive inference paradigm.
Kumaran, Dharshan
2013-06-19
Prior knowledge, in the form of a mental schema or framework, is viewed to facilitate the learning of new information in a range of experimental and everyday scenarios. Despite rising interest in the cognitive and neural mechanisms underlying schema-driven facilitation of new learning, few paradigms have been developed to examine this issue in humans. Here we develop a multiphase experimental scenario aimed at characterizing schema-based effects in the context of a paradigm that has been very widely used across species, the transitive inference task. We show that an associative schema, comprised of prior knowledge of the rank positions of familiar items in the hierarchy, has a marked effect on transitivity performance and the development of relational knowledge of the hierarchy that cannot be accounted for by more general changes in task strategy. Further, we show that participants are capable of deploying prior knowledge to successful effect under surprising conditions (i.e., when corrective feedback is totally absent), but only when the associative schema is robust. Finally, our results provide insights into the cognitive mechanisms underlying such schema-driven effects, and suggest that new hierarchy learning in the transitive inference task can occur through a contextual transfer mechanism that exploits the structure of associative experiences.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Querying clinical data in HL7 RIM based relational model with morph-RDB.
Priyatna, Freddy; Alonso-Calvo, Raul; Paraiso-Medina, Sergio; Corcho, Oscar
2017-10-05
Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. One possible approach to accommodate this need is to use RDB2RDF systems that provide RDF datasets as the unified view. These RDF datasets may be materialized and stored in a triple store, or transformed into RDF in real time, as virtual RDF data sources. Our previous efforts involved materialized RDF datasets, hence losing data freshness. In this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual, non-materialized SPARQL endpoint over the data. By applying a set of optimization techniques on the SPARQL-to-SQL query translation algorithm, we can now issue SPARQL queries to the underlying relational data with generally acceptable performance.
Visual information mining in remote sensing image archives
NASA Astrophysics Data System (ADS)
Pelizzari, Andrea; Descargues, Vincent; Datcu, Mihai P.
2002-01-01
The present article focuses on the development of interactive exploratory tools for visually mining the image content in large remote sensing archives. Two aspects are treated: the iconic visualization of the global information in the archive and the progressive visualization of the image details. The proposed methods are integrated in the Image Information Mining (I2M) system. The images and image structure in the I2M system are indexed based on a probabilistic approach. The resulting links are managed by a relational data base. Both the intrinsic complexity of the observed images and the diversity of user requests result in a great number of associations in the data base. Thus new tools have been designed to visualize, in iconic representation the relationships created during a query or information mining operation: the visualization of the query results positioned on the geographical map, quick-looks gallery, visualization of the measure of goodness of the query, visualization of the image space for statistical evaluation purposes. Additionally the I2M system is enhanced with progressive detail visualization in order to allow better access for operator inspection. I2M is a three-tier Java architecture and is optimized for the Internet.
Abd El Aziz, Mohamed; Selim, I M; Xiong, Shengwu
2017-06-30
This paper presents a new approach for the automatic detection of galaxy morphology from datasets based on an image-retrieval approach. Currently, there are several classification methods proposed to detect galaxy types within an image. However, in some situations, the aim is not only to determine the type of galaxy within the queried image, but also to determine the most similar images for query image. Therefore, this paper proposes an image-retrieval method to detect the type of galaxies within an image and return with the most similar image. The proposed method consists of two stages, in the first stage, a set of features is extracted based on shape, color and texture descriptors, then a binary sine cosine algorithm selects the most relevant features. In the second stage, the similarity between the features of the queried galaxy image and the features of other galaxy images is computed. Our experiments were performed using the EFIGI catalogue, which contains about 5000 galaxies images with different types (edge-on spiral, spiral, elliptical and irregular). We demonstrate that our proposed approach has better performance compared with the particle swarm optimization (PSO) and genetic algorithm (GA) methods.
Expediting Scientific Data Analysis with Reorganization of Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Byna, Surendra; Wu, Kesheng
2013-08-19
Data producers typically optimize the layout of data files to minimize the write time. In most cases, data analysis tasks read these files in access patterns different from the write patterns causing poor read performance. In this paper, we introduce Scientific Data Services (SDS), a framework for bridging the performance gap between writing and reading scientific data. SDS reorganizes data to match the read patterns of analysis tasks and enables transparent data reads from the reorganized data. We implemented a HDF5 Virtual Object Layer (VOL) plugin to redirect the HDF5 dataset read calls to the reorganized data. To demonstrate themore » effectiveness of SDS, we applied two parallel data organization techniques: a sort-based organization on a plasma physics data and a transpose-based organization on mass spectrometry imaging data. We also extended the HDF5 data access API to allow selection of data based on their values through a query interface, called SDS Query. We evaluated the execution time in accessing various subsets of data through existing HDF5 Read API and SDS Query. We showed that reading the reorganized data using SDS is up to 55X faster than reading the original data.« less
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Merticariu, Vlad; Baumann, Peter
2017-04-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well. This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics. We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Baumann, P.
2016-12-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well.This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics.We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Local classifier weighting by quadratic programming.
Cevikalp, Hakan; Polikar, Robi
2008-10-01
It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.
ERIC Educational Resources Information Center
Arendasy, Martin; Sommer, Markus
2007-01-01
This article deals with the investigation of the psychometric quality and constructs validity of algebra word problems generated by means of a schema-based version of the automatic min-max approach. Based on review of the research literature in algebra word problem solving and automatic item generation this new approach is introduced as a…
ERIC Educational Resources Information Center
Leh, Jayne
2011-01-01
Substantial evidence indicates that teacher-delivered schema-based instruction (SBI) facilitates significant increases in mathematics word problem solving (WPS) skills for diverse students; however research is unclear whether technology affordances facilitate superior gains in computer-mediated (CM) instruction in mathematics WPS when compared to…
ERIC Educational Resources Information Center
Jitendra, Asha K.; Dupuis, Danielle N.; Rodriguez, Michael C.; Zaslofsky, Anne F.; Slater, Susan; Cozine-Corroy, Kelly; Church, Chris
2013-01-01
This study compared the effects of delivering a supplemental, small-group tutoring intervention on the mathematics outcomes of third-grade students at risk for mathematics difficulties (MD) who were randomly assigned to either a schema-based instruction (SBI) or control group. SBI emphasized the underlying mathematical structure of additive…
AlzPharm: integration of neurodegeneration data using RDF.
Lam, Hugo Y K; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi
2007-05-09
Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields.
AlzPharm: integration of neurodegeneration data using RDF
Lam, Hugo YK; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi
2007-01-01
Background Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. Results We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Conclusion Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields. PMID:17493287
An efficient temporal database design method based on EER
NASA Astrophysics Data System (ADS)
Liu, Zhi; Huang, Jiping; Miao, Hua
2007-12-01
Many existing methods of modeling temporal information are based on logical model, which makes relational schema optimization more difficult and more complicated. In this paper, based on the conventional EER model, the author attempts to analyse and abstract temporal information in the phase of conceptual modelling according to the concrete requirement to history information. Then a temporal data model named BTEER is presented. BTEER not only retains all designing ideas and methods of EER which makes BTEER have good upward compatibility, but also supports the modelling of valid time and transaction time effectively at the same time. In addition, BTEER can be transformed to EER easily and automatically. It proves in practice, this method can model the temporal information well.
Parasol: An Architecture for Cross-Cloud Federated Graph Querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa
2014-06-22
Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less
Comparative Analysis of Rank Aggregation Techniques for Metasearch Using Genetic Algorithm
ERIC Educational Resources Information Center
Kaur, Parneet; Singh, Manpreet; Singh Josan, Gurpreet
2017-01-01
Rank Aggregation techniques have found wide applications for metasearch along with other streams such as Sports, Voting System, Stock Markets, and Reduction in Spam. This paper presents the optimization of rank lists for web queries put by the user on different MetaSearch engines. A metaheuristic approach such as Genetic algorithm based rank…
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan
Kinjo, Akira R.; Yamashita, Reiko; Nakamura, Haruki
2010-01-01
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/ PMID:20798081
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.
Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki
2010-08-25
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
BIOZON: a system for unification, management and analysis of heterogeneous biological data.
Birkland, Aaron; Yona, Golan
2006-02-15
Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.
Thimm, Jens C
2010-03-01
In schema therapy (ST), early maladaptive schemas (EMS) are proposed to be the defining core of personality disorders. Adverse relational experiences in childhood are assumed to be the main cause for the development of EMS. The present study explored the links between perceived parental rearing behaviours, EMS, and personality disorder symptoms in a clinical sample (N=108). Results from mediation analyses suggest that EMS mediate the relationships between recalled parenting rearing behaviours and personality disorder symptoms. Findings give support to the theoretical model ST is based on.
A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.
Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei
2013-10-08
Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.
ERIC Educational Resources Information Center
Jitendra, Asha K.; Harwell, Michael R.; Dupuis, Danielle N.; Karl, Stacy R.
2016-01-01
This paper reports results from a study investigating the efficacy of a proportional problem-solving intervention, schema-based instruction (SBI), in seventh grade. Participants included 806 students with mathematical difficulties in problem solving (MD-PS) from an initial pool of 1,999 seventh grade students in a larger study. Teachers and their…
ERIC Educational Resources Information Center
Jitendra, Asha K.; Harwell, Michael R.; Dupuis, Danielle N.; Karl, Stacy R.
2017-01-01
This article reports results from a study investigating the efficacy of a proportional problem-solving intervention, schema-based instruction (SBI), in seventh grade. Participants included 806 students with mathematical difficulties in problem solving (MD-PS) from an initial pool of 1,999 seventh grade students in a larger study. Teachers and…
ERIC Educational Resources Information Center
Root, Jenny Rose
2016-01-01
The current study evaluated the effects of modified schema-based instruction (SBI) on the algebra problem solving skills of three middle school students with autism spectrum disorder and moderate intellectual disability (ASD/ID). Participants learned to solve two types of group word problems: missing-whole and missing-part. The themes of the word…
RDFBuilder: a tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources.
Anguita, Alberto; Martin, Luis; Garcia-Remesal, Miguel; Maojo, Victor
2013-07-01
This paper presents RDFBuilder, a tool that enables RDF-based access to MAGE-ML-compliant microarray databases. We have developed a system that automatically transforms the MAGE-OM model and microarray data stored in the ArrayExpress database into RDF format. Additionally, the system automatically enables a SPARQL endpoint. This allows users to execute SPARQL queries for retrieving microarray data, either from specific experiments or from more than one experiment at a time. Our system optimizes response times by caching and reusing information from previous queries. In this paper, we describe our methods for achieving this transformation. We show that our approach is complementary to other existing initiatives, such as Bio2RDF, for accessing and retrieving data from the ArrayExpress database. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Mirador: A Simple, Fast Search Interface for Remote Sensing Data
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Strub, Richard; Seiler, Edward; Joshi, Talak; MacHarrie, Peter
2008-01-01
A major challenge for remote sensing science researchers is searching and acquiring relevant data files for their research projects based on content, space and time constraints. Several structured query (SQ) and hierarchical navigation (HN) search interfaces have been develop ed to satisfy this requirement, yet the dominant search engines in th e general domain are based on free-text search. The Goddard Earth Sci ences Data and Information Services Center has developed a free-text search interface named Mirador that supports space-time queries, inc luding a gazetteer and geophysical event gazetteer. In order to compe nsate for a slightly reduced search precision relative to SQ and HN t echniques, Mirador uses several search optimizations to return result s quickly. The quick response enables a more iterative search strateg y than is available with many SQ and HN techniques.
Usage of the Jess Engine, Rules and Ontology to Query a Relational Database
NASA Astrophysics Data System (ADS)
Bak, Jaroslaw; Jedrzejek, Czeslaw; Falkowski, Maciej
We present a prototypical implementation of a library tool, the Semantic Data Library (SDL), which integrates the Jess (Java Expert System Shell) engine, rules and ontology to query a relational database. The tool extends functionalities of previous OWL2Jess with SWRL implementations and takes full advantage of the Jess engine, by separating forward and backward reasoning. The optimization of integration of all these technologies is an advancement over previous tools. We discuss the complexity of the query algorithm. As a demonstration of capability of the SDL library, we execute queries using crime ontology which is being developed in the Polish PPBW project.
Venezky, Dina Y.; Newhall, Christopher G.
2007-01-01
WOVOdat Overview During periods of volcanic unrest, the ability to forecast near future activity has been a primary concern for human populations living near volcanoes. Our ability to forecast future activity and mitigate hazards is based on knowledge of previous activity at the volcano exhibiting unrest and knowledge of previous activity at similar volcanoes. A small set of experts with past experience are often involved in forecasting. We need to both preserve the knowledge the experts use and continue to investigate volcanic data to make better forecasts. Advances in instrumentation, networking, and data storage technologies have greatly increased our ability to collect volcanic data and share observations with our colleagues. The wealth of data creates numerous opportunities for gaining a better understanding of magmatic conditions and processes, if the data can be easily accessed for comparison. To allow for comparison of volcanic unrest data, we are creating a central database called WOVOdat. WOVOdat will contain a subset of time-series and geo-referenced data from each WOVO observatory in common and easily accessible formats. WOVOdat is being created for volcano experts in charge of forecasting volcanic activity, scientists investigating volcanic processes, and the public. The types of queries each of these groups might ask range from, 'What volcanoes were active in November of 2002?' and 'What are the relationships between tectonic earthquakes and volcanic processes?' to complex analyses of volcanic unrest to determine what future activity might occur. A new structure for storing and accessing our data was needed to examine processes across a wide range of volcanologic conditions. WOVOdat provides this new structure using relationships to connect the data parameters such that searches can be created for analogs of unrest. The subset of data that will fill WOVOdat will continue to be collected by the observatories, who will remain the primary archives of raw and detailed data on individual episodes of unrest. MySQL, an Open Source database, was chosen as the WOVOdat database for its integration with common web languages. The question of where the data will be stored and how the disparate data sets will be integrated will not be discussed in detail here. The focus of this document is to explain the data types, formats, and table organization chosen for WOVOdat 1.0. It was written for database administrators, data loaders, query writers, and anyone who monitors volcanoes. We begin with an overview of several challenges faced and solutions used in creating the WOVOdat schema. Specifics are then given for the parameters and table organization. After each table organization section, basic create table statements are included for viewing the database field formats. In the next stage of the project, scripts will be needed for data conversion, entry, and cleansing. Views will also need to be created once the data have been loaded and the basic queries are better known. Many questions and opportunities remain. We look forward to the growth and continual improvement in efficiency of the system. We hope WOVOdat will improve our understanding of magmatic systems and help mitigate future volcanic hazards.
Simulation of a Schema Theory-Based Knowledge Delivery System for Scientists.
ERIC Educational Resources Information Center
Vaughan, W. S., Jr.; Mavor, Anne S.
A future, automated, interactive, knowledge delivery system for use by researchers was tested using a manual cognitive model. Conceptualized from schema/frame/script theories in cognitive psychology and artificial intelligence, this hypothetical system was simulated by two psychologists who interacted with four researchers in microbiology to…
Individual Differences in the "Myside Bias" in Reasoning and Written Argumentation
ERIC Educational Resources Information Center
Wolfe, Christopher R.
2012-01-01
Three studies examined the "myside bias" in reasoning, evaluating written arguments, and writing argumentative essays. Previous research suggests that some people possess a fact-based argumentation schema and some people have a balanced argumentation schema. I developed reliable Likert scale instruments (1-7 rating) for these constructs…
Annotation and Classification of Argumentative Writing Revisions
ERIC Educational Resources Information Center
Zhang, Fan; Litman, Diane
2015-01-01
This paper explores the annotation and classification of students' revision behaviors in argumentative writing. A sentence-level revision schema is proposed to capture why and how students make revisions. Based on the proposed schema, a small corpus of student essays and revisions was annotated. Studies show that manual annotation is reliable with…
Beyond Linear Syntax: An Image-Oriented Communication Aid
ERIC Educational Resources Information Center
Patel, Rupal; Pilato, Sam; Roy, Deb
2004-01-01
This article presents a novel AAC communication aid based on semantic rather than syntactic schema, leading to more natural message construction. Users interact with a two-dimensional spatially organized image schema, which depicts the semantic structure and contents of the message. An overview of the interface design is presented followed by…
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
An Extensible Schema-less Database Framework for Managing High-throughput Semi-Structured Documents
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.; La, Tracy; Clancy, Daniel (Technical Monitor)
2002-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword searches of records for both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high throughput open database framework for managing, storing, and searching unstructured or semi structured arbitrary hierarchal models, XML and HTML.
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
BIOSPIDA: A Relational Database Translator for NCBI
Hagen, Matthew S.; Lee, Eva K.
2010-01-01
As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmidt, Matthew, E-mail: matthew.schmidt@varian.com; Grzetic, Shelby; Lo, Joseph Y.
Purpose: Prior work by the authors and other groups has studied the creation of automated intensity modulated radiotherapy (IMRT) plans of equivalent quality to those in a patient database of manually created clinical plans; those database plans provided guidance on the achievable sparing to organs-at-risk (OARs). However, in certain sites, such as head-and-neck, the clinical plans may not be sufficiently optimized because of anatomical complexity and clinical time constraints. This could lead to automated plans that suboptimally exploit OAR sparing. This work investigates a novel dose warping and scaling scheme that attempts to reduce effects of suboptimal sparing in clinicalmore » database plans, thus improving the quality of semiautomated head-and-neck cancer (HNC) plans. Methods: Knowledge-based radiotherapy (KBRT) plans for each of ten “query” patients were semiautomatically generated by identifying the most similar “match” patient in a database of 103 clinical manually created patient plans. The match patient’s plans were adapted to the query case by: (1) deforming the match beam fluences to suit the query target volume and (2) warping the match primary/boost dose distribution to suit the query geometry and using the warped distribution to generate query primary/boost optimization dose-volume constraints. Item (2) included a distance scaling factor to improve query OAR dose sparing with respect to the possibly suboptimal clinical match plan. To further compensate for a component plan of the match case (primary/boost) not optimally sparing OARs, the query dose volume constraints were reduced using a dose scaling factor to be the minimum from either (a) the warped component plan (primary or boost) dose distribution or (b) the warped total plan dose distribution (primary + boost) scaled in proportion to the ratio of component prescription dose to total prescription dose. The dose-volume constraints were used to plan the query case with no human intervention to adjust constraints during plan optimization. Results: KBRT and original clinical plans were dosimetrically equivalent for parotid glands (mean/median doses), spinal cord, and brainstem (maximum doses). KBRT plans significantly reduced larynx median doses (21.5 ± 6.6 Gy to 17.9 ± 3.9 Gy), and oral cavity mean (32.3 ± 6.2 Gy to 28.9 ± 5.4 Gy) and median (28.7 ± 5.7 Gy to 23.2 ± 5.3 Gy) doses. Doses to ipsilateral parotid gland, larynx, oral cavity, and brainstem were lower or equivalent in the KBRT plans for the majority of cases. By contrast, KBRT plans generated without the dose warping and dose scaling steps were not significantly different from the clinical plans. Conclusions: Fast, semiautomatically generated HNC IMRT plans adapted from existing plans in a clinical database can be of equivalent or better quality than manually created plans. The reductions in OAR doses in the semiautomated plans, compared to the clinical plans, indicate that the proposed dose warping and scaling method shows promise in mitigating the impact of suboptimal clinical plans.« less
Optimizability of OGC Standards Implementations - a Case Study
NASA Astrophysics Data System (ADS)
Misev, D.; Baumann, P.
2012-04-01
Why do we shop at Amazon? Because they have a unique offering that is nowhere else available? Certainly not. Rather, Amazon offers (i) simple, yet effective search; (ii) very simple payment; (iii) extremely rapid delivery. This is how scientific services will be distinguished in future: not for their data holding (there will be manifold choice), but for their service quality. We are facing the transition from data stewardship to service stewardship. One of the OGC standards which particularly enables flexible retrieval is the Web Coverage Processing Service (WCPS). It defines a high-level query language on large, multi-dimensional raster data, such as 1D timeseries, 2D EO imagery, 3D x/y/t image time series and x/y/z geophysical data, 4D x/y/z/t climate and ocean data. We have implemented WCPS based on an Array Database Management System, rasdaman, which is available in open source. In this demonstration, we study WCPS queries on 2D, 3D, and 4D data sets. Particular emphasis is placed on the computational load queries generate in such on-demand processing and filtering. We look at different techniques and their impact on performance, such as adaptive storage partitioning, query rewriting, and just-in-time compilation. Results show that there is significant potential for effective server-side optimization once a query language is sufficiently high-level and declarative.
Jalali, Farzad; Hasani, Alireza; Hashemi, Seyedeh Fatemeh; Kimiaei, Seyed Ali; Babaei, Ali
2018-06-01
Depression is one the most common mental disorders in prisons. People living with HIV are more likely to develop psychological difficulties when compared with the general population. This study aims to determine the efficacy of cognitive group therapy based on schema-focused approach in reducing depression in prisoners living with HIV. The design of this study was between-groups (or "independent measures"). It was conducted with pretest, posttest, and waiting list control group. The research population comprised all prisoners living with HIV in a men's prison in Iran. Based on voluntary desire, screening, and inclusion criteria, 42 prisoners living with HIV participated in this study. They were randomly assigned to an experimental group (21 prisoners) and waiting list control group (21 prisoners). The experimental group received 11 sessions of schema-focused cognitive group therapy, while the waiting list control group received the treatment after the completion of the study. The various groups were evaluated in terms of depression. ANCOVA models were employed to test the study hypotheses. Collated results indicated that depression was reduced among prisoners in the experimental group. Schema therapy (ST) could reduce depression among prisoners living with HIV/AIDS.
Predicting Host Level Reachability via Static Analysis of Routing Protocol Configuration
2007-09-01
check_function_bodies = false; SET client_min_messages = warning; -- -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres -- COMMENT...public; Owner: mcmanst -- -- -- Name: public; Type: ACL; Schema: -; Owner: postgres -- REVOKE ALL ON SCHEMA public FROM PUBLIC; REVOKE...ALL ON SCHEMA public FROM postgres ; GRANT ALL ON SCHEMA public TO postgres ; GRANT ALL ON SCHEMA public TO PUBLIC; -- -- PostgreSQL database
Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy
2014-01-01
Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).
CruiseViewer: SIOExplorer Graphical Interface to Metadata and Archives.
NASA Astrophysics Data System (ADS)
Sutton, D. W.; Helly, J. J.; Miller, S. P.; Chase, A.; Clark, D.
2002-12-01
We are introducing "CruiseViewer" as a prototype graphical interface for the SIOExplorer digital library project, part of the overall NSF National Science Digital Library (NSDL) effort. When complete, CruiseViewer will provide access to nearly 800 cruises, as well as 100 years of documents and images from the archives of the Scripps Institution of Oceanography (SIO). The project emphasizes data object accessibility, a rich metadata format, efficient uploading methods and interoperability with other digital libraries. The primary function of CruiseViewer is to provide a human interface to the metadata database and to storage systems filled with archival data. The system schema is based on the concept of an "arbitrary digital object" (ADO). Arbitrary in that if the object can be stored on a computer system then SIOExplore can manage it. Common examples are a multibeam swath bathymetry file, a .pdf cruise report, or a tar file containing all the processing scripts used on a cruise. We require a metadata file for every ADO in an ascii "metadata interchange format" (MIF), which has proven to be highly useful for operability and extensibility. Bulk ADO storage is managed using the Storage Resource Broker, SRB, data handling middleware developed at the San Diego Supercomputer Center that centralizes management and access to distributed storage devices. MIF metadata are harvested from several sources and housed in a relational (Oracle) database. For CruiseViewer, cgi scripts resident on an Apache server are the primary communication and service request handling tools. Along with the CruiseViewer java application, users can query, access and download objects via a separate method that operates through standard web browsers, http://sioexplorer.ucsd.edu. Both provide the functionability to query and view object metadata, and select and download ADOs. For the CruiseViewer application Java 2D is used to add a geo-referencing feature that allows users to select basemap images and have vector shapes representing query results mapped over the basemap in the image panel. The two methods together address a wide range of user access needs and will allow for widespread use of SIOExplorer.
RDF-GL: A SPARQL-Based Graphical Query Language for RDF
NASA Astrophysics Data System (ADS)
Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.
A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model
Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei
2013-01-01
Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078
Using structure to explore the sequence alignment space of remote homologs.
Kuziemko, Andrew; Honig, Barry; Petrey, Donald
2011-10-01
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
The image schema and innate archetypes: theoretical and clinical implications.
Merchant, John
2016-02-01
Based in contemporary neuroscience, Jean Knox's 2004 JAP paper 'From archetypes to reflective function' honed her position on image schemas, thereby introducing a model for archetypes which sees them as 'reliably repeated early developmental achievements' and not as genetically inherited, innate psychic structures. The image schema model is used to illustrate how the analyst worked with a patient who began life as an unwanted pregnancy, was adopted at birth and as an adult experienced profound synchronicities, paranormal/telepathic phenomena and visions. The classical approach to such phenomena would see the intense affectivity arising out of a ruptured symbiotic mother-infant relationship constellating certain archetypes which set up the patient's visions. This view is contrasted with Knox's model which sees the archetype an sich as a developmentally produced image schema underpinning the emergence of later imagery. The patient's visions can then be understood to arise from his psychoid body memory related to his traumatic conception and birth. The contemporary neuroscience which supports this view is outlined and a subsequent image schema explanation is presented. Clinically, the case material suggests that a pre-birth perspective needs to be explored in all analytic work. Other implications of Knox's image schema model are summarized. © 2016, The Society of Analytical Psychology.
NASA Astrophysics Data System (ADS)
Scheers, B.; Bloemen, S.; Mühleisen, H.; Schellart, P.; van Elteren, A.; Kersten, M.; Groot, P. J.
2018-04-01
Coming high-cadence wide-field optical telescopes will image hundreds of thousands of sources per minute. Besides inspecting the near real-time data streams for transient and variability events, the accumulated data archive is a wealthy laboratory for making complementary scientific discoveries. The goal of this work is to optimise column-oriented database techniques to enable the construction of a full-source and light-curve database for large-scale surveys, that is accessible by the astronomical community. We adopted LOFAR's Transients Pipeline as the baseline and modified it to enable the processing of optical images that have much higher source densities. The pipeline adds new source lists to the archive database, while cross-matching them with the known cataloguedsources in order to build a full light-curve archive. We investigated several techniques of indexing and partitioning the largest tables, allowing for faster positional source look-ups in the cross matching algorithms. We monitored all query run times in long-term pipeline runs where we processed a subset of IPHAS data that have image source density peaks over 170,000 per field of view (500,000 deg-2). Our analysis demonstrates that horizontal table partitions of declination widths of one-degree control the query run times. Usage of an index strategy where the partitions are densely sorted according to source declination yields another improvement. Most queries run in sublinear time and a few (< 20%) run in linear time, because of dependencies on input source-list and result-set size. We observed that for this logical database partitioning schema the limiting cadence the pipeline achieved with processing IPHAS data is 25 s.
Motivated Proteins: A web application for studying small three-dimensional protein motifs
Leader, David P; Milner-White, E James
2009-01-01
Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785
Mander, Johannes V; Jacob, Gitta A; Götz, Lea; Sammet, Isa; Zipfel, Stephan; Teufel, Martin
2015-01-01
The study aimed at analyzing associations between Grawe's general mechanisms of change and Young's early maladaptive schemas (EMS). Therefore, 98 patients completed the Scale for the Multiperspective Assessment of General Change Mechanisms in Psychotherapy (SACiP), the Young Shema Questionnaire-Short Form Revised (YSQ S3R), and diverse outcome measures at the beginning and end of treatment. Our results are important for clinical applications, as we demonstrated strong predictive effects of change mechanisms on schema domains using regression analyses and cross-lagged panel models. Resource activation experiences seem to be especially crucial in fostering alterations in EMS, as this change mechanism demonstrated significant associations with several schema domains. Future research should investigate these aspects in more detail using observer-based micro-process analyses.
SPARQL Query Re-writing Using Partonomy Based Transformation Rules
NASA Astrophysics Data System (ADS)
Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.
Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.
OASIS: A Data Fusion System Optimized for Access to Distributed Archives
NASA Astrophysics Data System (ADS)
Berriman, G. B.; Kong, M.; Good, J. C.
2002-05-01
The On-Line Archive Science Information Services (OASIS) is accessible as a java applet through the NASA/IPAC Infrared Science Archive home page. It uses Geographical Information System (GIS) technology to provide data fusion and interaction services for astronomers. These services include the ability to process and display arbitrarily large image files, and user-controlled contouring, overlay regeneration and multi-table/image interactions. OASIS has been optimized for access to distributed archives and data sets. Its second release (June 2002) provides a mechanism that enables access to OASIS from "third-party" services and data providers. That is, any data provider who creates a query form to an archive containing a collection of data (images, catalogs, spectra) can direct the result files from the query into OASIS. Similarly, data providers who serve links to datasets or remote services on a web page can access all of these data with one instance of OASIS. In this was any data or service provider is given access to the full suite of capabilites of OASIS. We illustrate the "third-party" access feature with two examples: queries to the high-energy image datasets accessible from GSFC SkyView, and links to data that are returned from a target-based query to the NASA Extragalactic Database (NED). The second release of OASIS also includes a file-transfer manager that reports the status of multiple data downloads from remote sources to the client machine. It is a prototype for a request management system that will ultimately control and manage compute-intensive jobs submitted through OASIS to computing grids, such as request for large scale image mosaics and bulk statistical analysis.
Schema Theory and Signaling: Implications for Text Design.
ERIC Educational Resources Information Center
Rodriguez, Stephen R.
This discussion of the implications of schema theory and signaling theory for the design of both paper- and computer-based text describes the macro and micro levels of text structure and their interaction, provides a definition of signaling, and identifies four types of signals: (1) pointer words informing the reader of the author's perspective on…
Acquiring Software Design Schemas: A Machine Learning Perspective
NASA Technical Reports Server (NTRS)
Harandi, Mehdi T.; Lee, Hing-Yan
1991-01-01
In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.
Development of schemas revealed by prior experience and NMDA receptor knock-out
Dragoi, George; Tonegawa, Susumu
2013-01-01
Prior experience accelerates acquisition of novel, related information through processes like assimilation into mental schemas, but the underlying neuronal mechanisms are poorly understood. We investigated the roles that prior experience and hippocampal CA3 N-Methyl-D-aspartate receptor (NMDAR)-dependent synaptic plasticity play in CA1 place cell sequence encoding and learning during novel spatial experiences. We found that specific representations of de novo experiences on linear environments were formed on a framework of pre configured network activity expressed in the preceding sleep and were rapidly, flexibly adjusted via NMDAR-dependent activity. This prior experience accelerated encoding of subsequent experiences on contiguous or isolated novel tracks, significantly decreasing their NMDAR-dependence. Similarly, de novo learning of an alternation task was facilitated by CA3 NMDARs; this experience accelerated subsequent learning of related tasks, independent of CA3 NMDARs, consistent with a schema-based learning. These results reveal the existence of distinct neuronal encoding schemes which could explain why hippocampal dysfunction results in anterograde amnesia while sparing recollection of old, schema-based memories. DOI: http://dx.doi.org/10.7554/eLife.01326.001 PMID:24327561
Güner, Olcay
2017-03-01
The Early Maladaptive Schema Questionnaires Set for Children and Adolescents (SQS) was developed to assess early maladaptive schemas in children between the ages of 10 and 16 in Turkey. The SQS consists of five questionnaires that represent five schema domains in Young's schema theory. Psychometric properties (n = 983) and normative values (n = 2250) of SQS were investigated in children and adolescents between the ages of 10 and 16. Both exploratory and confirmatory factor analyses were performed. Results revealed 15 schema factors under five schema domains, with good fit indexes. A total of 14 schema factors were in line with Young's early maladaptive schemas. In addition to these factors, one new schema emerged: self-disapproval. Reliability analyses showed that SQS has high internal consistency and consistency over a 1-month interval. Correlations of SQS with the Adjective Check List (ACL), the Inventory of Parent and Peer Attachment (IPPA), the Symptom Assessment (SA-45) and the Young Schema Questionnaire (YSQ) were investigated to assess criterion validity, and the correlations revealed encouraging results. SQS significantly differentiated between children who have clinical diagnoses (n = 78) and children who have no diagnosis (n = 100). Finally, general normative values (n = 2,250) were determined for age groups, gender and age/gender groups. In conclusion, the early maladaptive schema questionnaires set for children and adolescents turned out to be a reliable and valid questionnaire with standard scores.Copyright © 2016 John Wiley & Sons, Ltd. The early maladaptive schema questionnaires set for children and adolescents (SQS) is a psychometrically reliable and valid measure of early maladaptive schemas for children between the ages of 10 and 16. SQS consists of five schema domains that represent Young's schema domains including 15 early maladaptive schemas and 97 items. Normative values for each schema were determined for age, gender and age/gender groups. Clinically, SQS presents valuable information about early maladaptive schemas during childhood and adolescence, before such schemas become more pervasive and persistent. Copyright © 2016 John Wiley & Sons, Ltd.
Data Warehouse Design from HL7 Clinical Document Architecture Schema.
Pecoraro, Fabrizio; Luzi, Daniela; Ricci, Fabrizio L
2015-01-01
This paper proposes a semi-automatic approach to extract clinical information structured in a HL7 Clinical Document Architecture (CDA) and transform it in a data warehouse dimensional model schema. It is based on a conceptual framework published in a previous work that maps the dimensional model primitives with CDA elements. Its feasibility is demonstrated providing a case study based on the analysis of vital signs gathered during laboratory tests.
Three hybridization models based on local search scheme for job shop scheduling problem
NASA Astrophysics Data System (ADS)
Balbi Fraga, Tatiana
2015-05-01
This work presents three different hybridization models based on the general schema of Local Search Heuristics, named Hybrid Successive Application, Hybrid Neighborhood, and Hybrid Improved Neighborhood. Despite similar approaches might have already been presented in the literature in other contexts, in this work these models are applied to analyzes the solution of the job shop scheduling problem, with the heuristics Taboo Search and Particle Swarm Optimization. Besides, we investigate some aspects that must be considered in order to achieve better solutions than those obtained by the original heuristics. The results demonstrate that the algorithms derived from these three hybrid models are more robust than the original algorithms and able to get better results than those found by the single Taboo Search.
CoReCG: a comprehensive database of genes associated with colon-rectal cancer
Agarwal, Rahul; Kumar, Binayak; Jayadev, Msk; Raghav, Dhwani; Singh, Ashutosh
2016-01-01
Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like genes, mutations, expression changes, epigenetic alteration or post translation modification, genetic heterogeneity. Earlier findings were mostly focused on understanding etiology of colorectal carcinogenesis but less emphasis were given for the comprehensive review of the existing findings of individual studies which can provide better diagnostics based on the suggested markers in discrete studies. Colon Rectal Cancer Gene Database (CoReCG), contains 2056 colon-rectal cancer genes information involved in distinct colorectal cancer stages sourced from published literature with an effective knowledge based information retrieval system. Additionally, interactive web interface enriched with various browsing sections, augmented with advance search facility for querying the database is provided for user friendly browsing, online tools for sequence similarity searches and knowledge based schema ensures a researcher friendly information retrieval mechanism. Colorectal cancer gene database (CoReCG) is expected to be a single point source for identification of colorectal cancer-related genes, thereby helping with the improvement of classification, diagnosis and treatment of human cancers. Database URL: lms.snu.edu.in/corecg PMID:27114494
Toward a view-oriented approach for aligning RDF-based biomedical repositories.
Anguita, A; García-Remesal, M; de la Iglesia, D; Graf, N; Maojo, V
2015-01-01
This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". The need for complementary access to multiple RDF databases has fostered new lines of research, but also entailed new challenges due to data representation disparities. While several approaches for RDF-based database integration have been proposed, those focused on schema alignment have become the most widely adopted. All state-of-the-art solutions for aligning RDF-based sources resort to a simple technique inherited from legacy relational database integration methods. This technique - known as element-to-element (e2e) mappings - is based on establishing 1:1 mappings between single primitive elements - e.g. concepts, attributes, relationships, etc. - belonging to the source and target schemas. However, due to the intrinsic nature of RDF - a representation language based on defining tuples < subject, predicate, object > -, one may find RDF elements whose semantics vary dramatically when combined into a view involving other RDF elements - i.e. they depend on their context. The latter cannot be adequately represented in the target schema by resorting to the traditional e2e approach. These approaches fail to properly address this issue without explicitly modifying the target ontology, thus lacking the required expressiveness for properly reflecting the intended semantics in the alignment information. To enhance existing RDF schema alignment techniques by providing a mechanism to properly represent elements with context-dependent semantics, thus enabling users to perform more expressive alignments, including scenarios that cannot be adequately addressed by the existing approaches. Instead of establishing 1:1 correspondences between single primitive elements of the schemas, we propose adopting a view-based approach. The latter is targeted at establishing mapping relationships between RDF subgraphs - that can be regarded as the equivalent of views in traditional databases -, rather than between single schema elements. This approach enables users to represent scenarios defined by context-dependent RDF elements that cannot be properly represented when adopting the currently existing approaches. We developed a software tool implementing our view-based strategy. Our tool is currently being used in the context of the European Commission funded p-medicine project, targeted at creating a technological framework to integrate clinical and genomic data to facilitate the development of personalized drugs and therapies for cancer, based on the genetic profile of the patient. We used our tool to integrate different RDF-based databases - including different repositories of clinical trials and DICOM images - using the Health Data Ontology Trunk (HDOT) ontology as the target schema. The importance of database integration methods and tools in the context of biomedical research has been widely recognized. Modern research in this area - e.g. identification of disease biomarkers, or design of personalized therapies - heavily relies on the availability of a technical framework to enable researchers to uniformly access disparate repositories. We present a method and a tool that implement a novel alignment method specifically designed to support and enhance the integration of RDF-based data sources at schema (metadata) level. This approach provides an increased level of expressiveness compared to other existing solutions, and allows solving heterogeneity scenarios that cannot be properly represented using other state-of-the-art techniques.
A narrative review of schemas and schema therapy outcomes in the eating disorders.
Pugh, Matthew
2015-07-01
Whilst cognitive-behavioural therapy has demonstrated efficacy in the treatment of eating disorders, therapy outcomes and current conceptualizations still remain inadequate. In light of these shortcomings there has been growing interest in the utility of schema therapy applied to eating pathology. The present article first provides a narrative review of empirical literature exploring schemas and schema processes in eating disorders. Secondly, it critically evaluates outcome studies assessing schema therapy applied to eating disorders. Current evidence lends support to schema-focused conceptualizations of eating pathology and confirms that eating disorders are characterised by pronounced maladaptive schemas. Treatment outcomes also indicate that schema therapy, the schema-mode approach, and associated techniques are promising interventions for complex eating disorders. Implications for clinical practice and future directions for research are discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Liu, Bin; Wu, Hao; Zhang, Deyuan; Wang, Xiaolong; Chou, Kuo-Chen
2017-02-21
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
mantisGRID: a grid platform for DICOM medical images management in Colombia and Latin America.
Garcia Ruiz, Manuel; Garcia Chaves, Alvin; Ruiz Ibañez, Carlos; Gutierrez Mazo, Jorge Mario; Ramirez Giraldo, Juan Carlos; Pelaez Echavarria, Alejandro; Valencia Diaz, Edison; Pelaez Restrepo, Gustavo; Montoya Munera, Edwin Nelson; Garcia Loaiza, Bernardo; Gomez Gonzalez, Sebastian
2011-04-01
This paper presents the mantisGRID project, an interinstitutional initiative from Colombian medical and academic centers aiming to provide medical grid services for Colombia and Latin America. The mantisGRID is a GRID platform, based on open source grid infrastructure that provides the necessary services to access and exchange medical images and associated information following digital imaging and communications in medicine (DICOM) and health level 7 standards. The paper focuses first on the data abstraction architecture, which is achieved via Open Grid Services Architecture Data Access and Integration (OGSA-DAI) services and supported by the Globus Toolkit. The grid currently uses a 30-Mb bandwidth of the Colombian High Technology Academic Network, RENATA, connected to Internet 2. It also includes a discussion on the relational database created to handle the DICOM objects that were represented using Extensible Markup Language Schema documents, as well as other features implemented such as data security, user authentication, and patient confidentiality. Grid performance was tested using the three current operative nodes and the results demonstrated comparable query times between the mantisGRID (OGSA-DAI) and Distributed mySQL databases, especially for a large number of records.
Pang, Chao; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans; Swertz, Morris A
2016-07-15
While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect : m.a.swertz@rug.nl Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Pang, Chao; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K. Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans
2016-01-01
Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Availability and Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect. Contact: m.a.swertz@rug.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153686
Mental Schemas Hamper Memory Storage of Goal-Irrelevant Information
Sweegers, C. C. G.; Coleman, G. A.; van Poppel, E. A. M.; Cox, R.; Talamini, L. M.
2015-01-01
Mental schemas exert top-down control on information processing, for instance by facilitating the storage of schema-related information. However, given capacity-limits and competition in neural network processing, schemas may additionally exert their effects by suppressing information with low momentary relevance. In particular, when existing schemas suffice to guide goal-directed behavior, this may actually reduce encoding of the redundant sensory input, in favor of gaining efficiency in task performance. The present experiment set out to test this schema-induced shallow encoding hypothesis. Our approach involved a memory task in which faces had to be coupled to homes. For half of the faces the responses could be guided by a pre-learned schema, for the other half of the faces such a schema was not available. Memory storage was compared between schema-congruent and schema-incongruent items. To characterize putative schema effects, memory was assessed both with regard to visual details and contextual aspects of each item. The depth of encoding was also assessed through an objective neural measure: the parietal old/new ERP effect. This ERP effect, observed between 500–800 ms post-stimulus onset, is thought to reflect the extent of recollection: the retrieval of a vivid memory, including various contextual details from the learning episode. We found that schema-congruency induced substantial impairments in item memory and even larger ones in context memory. Furthermore, the parietal old/new ERP effect indicated higher recollection for the schema-incongruent than the schema-congruent memories. The combined findings indicate that, when goals can be achieved using existing schemas, this can hinder the in-depth processing of novel input, impairing the formation of perceptually detailed and contextually rich memory traces. Taking into account both current and previous findings, we suggest that schemas can both positively and negatively bias the processing of sensory input. An important determinant in this matter is likely related to momentary goals, such that mental schemas facilitate memory processing of goal-relevant input, but suppress processing of goal-irrelevant information. Highlights – Schema-congruent information suffers from shallow encoding. – Schema congruency induces poor item and context memory. – The parietal old/new effect is less pronounced for schema-congruent items. – Schemas exert different influences on memory formation depending on current goals. PMID:26635582
The evolution of the CUAHSI Water Markup Language (WaterML)
NASA Astrophysics Data System (ADS)
Zaslavsky, I.; Valentine, D.; Maidment, D.; Tarboton, D. G.; Whiteaker, T.; Hooper, R.; Kirschtel, D.; Rodriguez, M.
2009-04-01
The CUAHSI Hydrologic Information System (HIS, his.cuahsi.org) uses web services as the core data exchange mechanism which provides programmatic connection between many heterogeneous sources of hydrologic data and a variety of online and desktop client applications. The service message schema follows the CUAHSI Water Markup Language (WaterML) 1.x specification (see OGC Discussion Paper 07-041r1). Data sources that can be queried via WaterML-compliant water data services include national and international repositories such as USGS NWIS (National Water Information System), USEPA STORET (Storage & Retrieval), USDA SNOTEL (Snowpack Telemetry), NCDC ISH and ISD(Integrated Surface Hourly and Daily Data), MODIS (Moderate Resolution Imaging Spectroradiometer), and DAYMET (Daily Surface Weather Data and Climatological Summaries). Besides government data sources, CUAHSI HIS provides access to a growing number of academic hydrologic observation networks. These networks are registered by researchers associated with 11 hydrologic observatory testbeds around the US, and other research, government and commercial groups wishing to join the emerging CUAHSI Water Data Federation. The Hydrologic Information Server (HIS Server) software stack deployed at NSF-supported hydrologic observatory sites and other universities around the country, supports a hydrologic data publication workflow which includes the following steps: (1) observational data are loaded from static files or streamed from sensors into a local instance of an Observations Data Model (ODM) database; (2) a generic web service template is configured for the new ODM instance to expose the data as a WaterML-compliant water data service, and (3) the new water data service is registered at the HISCentral registry (hiscentral.cuahsi.org), its metadata are harvested and semantically tagged using concepts from a hydrologic ontology. As a result, the new service is indexed in the CUAHSI central metadata catalog, and becomes available for spatial and semantics-based queries. The main component of interoperability across hydrologic data repositories in CUAHSI HIS is mapping different repository schemas and semantics to a shared community information model for observations made at stationary points. This information model has been implemented as both a relational schema (ODM) and an XML schema (WaterML). Its main design drivers have been data storage and data interchange needs of hydrology researchers, a series of community reviews of the ODM, and the practices of hydrologic data modeling and presentation adopted by federal agencies as observed in agency online data access applications, such as NWISWeb and USEPA STORET. The goal of the first version of WaterML was to encode the semantics of hydrologic observations discovery and retrieval and implement water data services in a way that is generic across different data providers. In particular, this implied maintaining a single common representation for the key constructs returned to web service calls, related to observations, features of interest, observation procedures, observation series, etc. Another WaterML design consideration was to create (in version 1 of CUAHSI HIS in particular) a fairly rigid, compact, and simple XML schema which was easy to generate and parse, thus creating the least barrier for adoption by hydrologists. Each of the three main request methods in the water data web services - GetSiteInfo, GetVariableInfo, and GetValues - has a corresponding response element in WaterML: SiteResponse, VariableResponse, and TimeSeriesResponse. The strictness and compactness of the first version of WaterML supported its community adoption. Over the last two years, several ODM and WaterML implementations for various platforms have emerged, and several Water Data Services client applications have been created by outside groups in both industry and academia. In a significant development, the WaterML specification has been adopted by federal agencies. The experimental USGS NWIS Daily Values web service returns WaterML-compliant TimeSeriesResponse. NCDC is also prototyping WaterML for data delivery, and has developed a REST-based service that generates WaterML- compliant output for its integrated station network. These agency-supported web services provide a much more efficient way to deliver agency data compared to the web site scraper services that the CUAHSI HIS project developed initially. Adoption of WaterML by the US Geological Survey is particularly significant because the USGS maintains by far the largest water data repository in the United States. For version 1.1, WaterML has evolved to reflect the deployment experience at hydrologic observatory testbeds, as well as feedback from hydrologic data repository managers at federal and state agencies. Further development of WaterML and enhancement of the underlying information model is the focus of the recently established OGC Hydrology Domain Working Group, whose mission is to profile OGC standards (GML, O&M, SOS, WCS, WFS) for the water resources domain and thus ensure WaterML's wider applicability and easier implementation. WaterML 2.0 is envisioned as an OGC-compliant application schema that supports OGC features, can express different types of observations and various groupings of observations, and allows researchers to define custom metadata elements. This presentation will discuss the information model underlying WaterML and describe the rationale, design drivers and evolution of WaterML and the water data services, illustrating their recent application in the context of CUAHSI HIS and the hydrologic observatory testbeds.
Teng, Rui; Leibnitz, Kenji; Miura, Ryu
2013-01-01
An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172
Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J
2016-08-02
Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
Dunne, Ashley L; Gilbert, Flora; Lee, Stuart; Daffern, Michael
2018-05-01
Contemporary social-cognitive aggression theory and extant empirical research highlights the relationship between certain Early Maladaptive Schemas (EMSs) and aggression in offenders. To date, the related construct of schema modes, which presents a comprehensive and integrated schema unit, has received scant empirical attention. Furthermore, EMSs and schema modes have yet to be examined concurrently with respect to aggressive behavior. This study examined associations between EMSs, schema modes, and aggression in an offender sample. Two hundred and eight adult male prisoners completed self-report psychological tests measuring their histories of aggression, EMSs, and schema modes. Regression analyses revealed that EMSs were significantly associated with aggression but did not account for a unique portion of variance once the effects of schema modes were taken into account. Three schema modes, Enraged Child, Impulsive Child, and Bully and Attack, significantly predicted aggression. These findings support the proposition that schema modes characterized by escalating states of anger, rage, and impulsivity characterize aggressive offenders. In this regard, we call attention to the need to include schema modes in contemporary social-cognitive aggression theories, and suggest that systematic assessment and treatment of schema modes has the potential to enhance outcomes with violent offenders. © 2018 Wiley Periodicals, Inc.
Wojciechowska, Katarzyna; Jurowski, Piotr; Wieckowska-Szakiel, Marzena; Rózalska, Barbara
2012-01-01
Estimation of cytostatics influence used in breast cancer treatment on lysozyme activity in human tears depend on time of treatment. 8 women were treated at the base of chemotherapy schema: docetaxel with doxorubicin and 4 women treated with schema CMF: cyclophosphamide, methotrexate, 5-fluorouracil. Lysozyme activity in tears was assessed by measurement of diameter zone of Micrococcus lysodeicticus growth inhibition. It was revealed that both chemotherapy schema caused statistically significant reduction of diameter zone of M. lysodeicticus growth inhibition, after first and second course of chemotherapy treatment. After second chemotherapy course CMF schema induced loss of lysozyme activity in patient's tears (zero mm of M. lysodeicticus diameter zone growth inhibition). Systemic chemotherapy administered in breast cancer induce reduction of lysozyme activity in tears, that may cause higher morbidity of ocular surface infections caused by Gram-positive bacteria.
[Examination of the Young maladaptive schemes in a group of Gamblers Anonymous].
Katona, Zsuzsa; Körmendi, Attila
2012-01-01
Literature of gambling addiction has become widespread in last years. Many studies were written about the vulnerability factors helping the development of addiction, theoretical models, comorbid problems and therapy possibilities. Currently there is no integrated theoretical model that could explain sufficiently the development and maintenance of pathological gambling. The treatment issue is also unresolved. Cognitive psychology is a dynamically developing field of psychology and good results are achieved in gambling treatment with applying cognitive techniques. Jeffrey Young's schema-focused therapy is a recent theoretical and therapeutic direction within cognitive psychology which emphasizes the necessity of emotional changes beside rational ones in the interest of efficiency. The purpose of our research is to examine and analyse active maladaptive schemas among gamblers who are members of Gamblers Anonymous self-help group. 23 control persons and 23 gamblers associated with support group of Gamblers Anonymous took part in our research. The severity of gambling behaviour was measured by Gamblers Anonymous Twenty Questions. For exploring maladaptive schemas we used the shorter 114-item version of the Young Schema Questionnaire (YSQ-S3). All the examined gamblers were considered as problem gamblers based on Gamblers Anonymous Twenty Questions. In the control group there where no active schemas while in the group of gamblers several schemas (Emotional deprivation, Self-sacrifice, Recognition seeking, Emotional inhibition, Unrelenting standards, Self-punitiveness, Insufficient self-control) showed activity. Active schemas show similarity in their matter with main establishments of researches about gamblers and support the role of impulsivity, narcissistic traits, self-medicalization and emotional deprivation in the development and maintenance of pathological gambling.
Reformulating Constraints for Compilability and Efficiency
NASA Technical Reports Server (NTRS)
Tong, Chris; Braudaway, Wesley; Mohan, Sunil; Voigt, Kerstin
1992-01-01
KBSDE is a knowledge compiler that uses a classification-based approach to map solution constraints in a task specification onto particular search algorithm components that will be responsible for satisfying those constraints (e.g., local constraints are incorporated in generators; global constraints are incorporated in either testers or hillclimbing patchers). Associated with each type of search algorithm component is a subcompiler that specializes in mapping constraints into components of that type. Each of these subcompilers in turn uses a classification-based approach, matching a constraint passed to it against one of several schemas, and applying a compilation technique associated with that schema. While much progress has occurred in our research since we first laid out our classification-based approach [Ton91], we focus in this paper on our reformulation research. Two important reformulation issues that arise out of the choice of a schema-based approach are: (1) compilability-- Can a constraint that does not directly match any of a particular subcompiler's schemas be reformulated into one that does? and (2) Efficiency-- If the efficiency of the compiled search algorithm depends on the compiler's performance, and the compiler's performance depends on the form in which the constraint was expressed, can we find forms for constraints which compile better, or reformulate constraints whose forms can be recognized as ones that compile poorly? In this paper, we describe a set of techniques we are developing for partially addressing these issues.
Facilitating NCAR Data Discovery by Connecting Related Resources
NASA Astrophysics Data System (ADS)
Rosati, A.
2012-12-01
Linking datasets, creators, and users by employing the proper standards helps to increase the impact of funded research. In order for users to find a dataset, it must first be named. Data citations play the important role of giving datasets a persistent presence by assigning a formal "name" and location. This project focuses on the next step of the "name-find-use" sequence: enhancing discoverability of NCAR data by connecting related resources on the web. By examining metadata schemas that document datasets, I examined how Semantic Web approaches can help to ensure the widest possible range of data users. The focus was to move from search engine optimization (SEO) to information connectivity. Two main markup types are very visible in the Semantic Web and applicable to scientific dataset discovery: The Open Archives Initiative-Object Reuse and Exchange (OAI-ORE - www.openarchives.org) and Microdata (HTML5 and www.schema.org). My project creates pilot aggregations of related resources using both markup types for three case studies: The North American Regional Climate Change Assessment Program (NARCCAP) dataset and related publications, the Palmer Drought Severity Index (PSDI) animation and image files from NCAR's Visualization Lab (VisLab), and the multidisciplinary data types and formats from the Advanced Cooperative Arctic Data and Information Service (ACADIS). This project documents the differences between these markups and how each creates connectedness on the web. My recommendations point toward the most efficient and effective markup schema for aggregating resources within the three case studies based on the following assessment criteria: ease of use, current state of support and adoption of technology, integration with typical web tools, available vocabularies and geoinformatic standards, interoperability with current repositories and access portals (e.g. ESG, Java), and relation to data citation tools and methods.
Spatial aggregation query in dynamic geosensor networks
NASA Astrophysics Data System (ADS)
Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun
2007-11-01
Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.
Leverage and Delegation in Developing an Information Model for Geology
NASA Astrophysics Data System (ADS)
Cox, S. J.
2007-12-01
GeoSciML is an information model and XML encoding developed by a group of primarily geologic survey organizations under the auspices of the IUGS CGI. The scope of the core model broadly corresponds with information traditionally portrayed on a geologic map, viz. interpreted geology, some observations, the map legend and accompanying memoir. The development of GeoSciML has followed the methodology specified for an Application Schema defined by OGC and ISO 19100 series standards. This requires agreement within a community concerning their domain model, its formal representation using UML, documentation as a Feature Type Catalogue, with an XML Schema implementation generated from the model by applying a rule-based transformation. The framework and technology supports a modular governance process. Standard datatypes and GI components (geometry, the feature and coverage metamodels, metadata) are imported from the ISO framework. The observation and sampling model (including boreholes) is imported from OGC. The scale used for most scalar literal values (terms, codes, measures) allows for localization where necessary. Wildcards and abstract base- classes provide explicit extensibility points. Link attributes appear in a regular way in the encodings, allowing reference to external resources using URIs. The encoding is compatible with generic GI data-service interfaces (WFS, WMS, SOS). For maximum interoperability within a community, the interfaces may be specialised through domain-specified constraints (e.g. feature-types, scale and vocabulary bindings, query-models). Formalization using UML and XML allows use of standard validation and processing tools. Use of upper-level elements defined for generic GI application reduces the development effort and governance resonsibility, while maximising cross-domain interoperability. On the other hand, enabling specialization to be delegated in a controlled manner is essential to adoption across a range of subdisciplines and jurisdictions. The GeoSciML design team is responsible only for the part of the model that is unique to geology but for which general agreement can be reached within the domain. This paper is presented on behalf of the Interoperability Working Group of the IUGS Commission for Geoscience Information (CGI) - follow web-link for details of the membership.
FPGA-based protein sequence alignment : A review
NASA Astrophysics Data System (ADS)
Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi
2017-11-01
Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.
Horvath, Dragos; Marcou, Gilles; Varnek, Alexandre
2013-07-22
This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter α. Both initial "guesses" of α = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html , comparing favorably to Tanimoto in terms of "scaffold hopping" propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by "pro-query" Tversky scores at α > 0.5. Among simpler queries, one may distinguish between "growable" (allowing for active analogs with additional features), and a few "conservative" queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of "pro-candidate" biased Tversky scores at α < 0.5.
Karp, Peter D; Paley, Suzanne; Romero, Pedro
2002-01-01
Bioinformatics requires reusable software tools for creating model-organism databases (MODs). The Pathway Tools is a reusable, production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frazier, Christopher Rawls; Durfee, Justin David; Bandlow, Alisa
The Contingency Contractor Optimization Tool – Prototype (CCOT-P) database is used to store input and output data for the linear program model described in [1]. The database allows queries to retrieve this data and updating and inserting new input data.
Jalali, Mohammad Reza; Zargar, Mohammad; Salavati, Mojgan; Kakavand, Ali Reza
2011-01-01
The aim of this study was to examine the difference of early maladaptive schemas and parenting origins in opioid abusers and non-opioid abusers. The early maladaptive schemas and parenting origins were compared in 56 opioid abusers and 56 non-opioids abusers. Schemas were assessed by the Young Schema Questionnaire 3rd (short form); and parenting origins were assessed by the Young Parenting Inventory. Data were analyzed by multivariate analysis of variance (MANOVA). The analysis showed that the means for schemas between opioid abusers and non-opioid abusers were different. Chi square test showed that parenting origins were significantly associated with their related schemas. The early maladaptive schemas and parenting origins in opioid abusers were more than non-opioid abusers; and parenting origins were related to their Corresponding schemas.
Operating wind turbines in strong wind conditions by using feedforward-feedback control
NASA Astrophysics Data System (ADS)
Feng, Ju; Sheng, Wen Zhong
2014-12-01
Due to the increasing penetration of wind energy into power systems, it becomes critical to reduce the impact of wind energy on the stability and reliability of the overall power system. In precedent works, Shen and his co-workers developed a re-designed operation schema to run wind turbines in strong wind conditions based on optimization method and standard PI feedback control, which can prevent the typical shutdowns of wind turbines when reaching the cut-out wind speed. In this paper, a new control strategy combing the standard PI feedback control with feedforward controls using the optimization results is investigated for the operation of variable-speed pitch-regulated wind turbines in strong wind conditions. It is shown that the developed control strategy is capable of smoothening the power output of wind turbine and avoiding its sudden showdown at high wind speeds without worsening the loads on rotor and blades.
Big Data Analytics with Datalog Queries on Spark.
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2016-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.
Big Data Analytics with Datalog Queries on Spark
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2017-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296
An interactive system for computer-aided diagnosis of breast masses.
Wang, Xingwei; Li, Lihua; Liu, Wei; Xu, Weidong; Lederman, Dror; Zheng, Bin
2012-10-01
Although mammography is the only clinically accepted imaging modality for screening the general population to detect breast cancer, interpreting mammograms is difficult with lower sensitivity and specificity. To provide radiologists "a visual aid" in interpreting mammograms, we developed and tested an interactive system for computer-aided detection and diagnosis (CAD) of mass-like cancers. Using this system, an observer can view CAD-cued mass regions depicted on one image and then query any suspicious regions (either cued or not cued by CAD). CAD scheme automatically segments the suspicious region or accepts manually defined region and computes a set of image features. Using content-based image retrieval (CBIR) algorithm, CAD searches for a set of reference images depicting "abnormalities" similar to the queried region. Based on image retrieval results and a decision algorithm, a classification score is assigned to the queried region. In this study, a reference database with 1,800 malignant mass regions and 1,800 benign and CAD-generated false-positive regions was used. A modified CBIR algorithm with a new function of stretching the attributes in the multi-dimensional space and decision scheme was optimized using a genetic algorithm. Using a leave-one-out testing method to classify suspicious mass regions, we compared the classification performance using two CBIR algorithms with either equally weighted or optimally stretched attributes. Using the modified CBIR algorithm, the area under receiver operating characteristic curve was significantly increased from 0.865 ± 0.006 to 0.897 ± 0.005 (p < 0.001). This study demonstrated the feasibility of developing an interactive CAD system with a large reference database and achieving improved performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Segev, A.; Fang, W.
In currency-based updates, processing a query to a materialized view has to satisfy a currency constraint which specifies the maximum time lag of the view data with respect to a transaction database. Currency-based update policies are more general than periodical, deferred, and immediate updates; they provide additional opportunities for optimization and allow updating a materialized view from other materialized views. In this paper, we present algorithms to determine the source and timing of view updates and validate the resulting cost savings through simulation results. 20 refs.
Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y
2017-09-22
A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tourassi, Georgia D.; Harrawood, Brian; Singh, Swatee
2007-08-15
We have previously presented a knowledge-based computer-assisted detection (KB-CADe) system for the detection of mammographic masses. The system is designed to compare a query mammographic region with mammographic templates of known ground truth. The templates are stored in an adaptive knowledge database. Image similarity is assessed with information theoretic measures (e.g., mutual information) derived directly from the image histograms. A previous study suggested that the diagnostic performance of the system steadily improves as the knowledge database is initially enriched with more templates. However, as the database increases in size, an exhaustive comparison of the query case with each stored templatemore » becomes computationally burdensome. Furthermore, blind storing of new templates may result in redundancies that do not necessarily improve diagnostic performance. To address these concerns we investigated an entropy-based indexing scheme for improving the speed of analysis and for satisfying database storage restrictions without compromising the overall diagnostic performance of our KB-CADe system. The indexing scheme was evaluated on two different datasets as (i) a search mechanism to sort through the knowledge database, and (ii) a selection mechanism to build a smaller, concise knowledge database that is easier to maintain but still effective. There were two important findings in the study. First, entropy-based indexing is an effective strategy to identify fast a subset of templates that are most relevant to a given query. Only this subset could be analyzed in more detail using mutual information for optimized decision making regarding the query. Second, a selective entropy-based deposit strategy may be preferable where only high entropy cases are maintained in the knowledge database. Overall, the proposed entropy-based indexing scheme was shown to reduce the computational cost of our KB-CADe system by 55% to 80% while maintaining the system's diagnostic performance.« less
Janson, Lucas; Schmerling, Edward; Clark, Ashley; Pavone, Marco
2015-01-01
In this paper we present a novel probabilistic sampling-based motion planning algorithm called the Fast Marching Tree algorithm (FMT*). The algorithm is specifically aimed at solving complex motion planning problems in high-dimensional configuration spaces. This algorithm is proven to be asymptotically optimal and is shown to converge to an optimal solution faster than its state-of-the-art counterparts, chiefly PRM* and RRT*. The FMT* algorithm performs a “lazy” dynamic programming recursion on a predetermined number of probabilistically-drawn samples to grow a tree of paths, which moves steadily outward in cost-to-arrive space. As such, this algorithm combines features of both single-query algorithms (chiefly RRT) and multiple-query algorithms (chiefly PRM), and is reminiscent of the Fast Marching Method for the solution of Eikonal equations. As a departure from previous analysis approaches that are based on the notion of almost sure convergence, the FMT* algorithm is analyzed under the notion of convergence in probability: the extra mathematical flexibility of this approach allows for convergence rate bounds—the first in the field of optimal sampling-based motion planning. Specifically, for a certain selection of tuning parameters and configuration spaces, we obtain a convergence rate bound of order O(n−1/d+ρ), where n is the number of sampled points, d is the dimension of the configuration space, and ρ is an arbitrarily small constant. We go on to demonstrate asymptotic optimality for a number of variations on FMT*, namely when the configuration space is sampled non-uniformly, when the cost is not arc length, and when connections are made based on the number of nearest neighbors instead of a fixed connection radius. Numerical experiments over a range of dimensions and obstacle configurations confirm our the-oretical and heuristic arguments by showing that FMT*, for a given execution time, returns substantially better solutions than either PRM* or RRT*, especially in high-dimensional configuration spaces and in scenarios where collision-checking is expensive. PMID:27003958
1988-12-01
argument schema based on the one devel- oped by Toulmin et al. (1984). In Toulmin’s schema (Figure 4-2), a claim, or 3 conclusion whose merits we are seeking...probability judgment. Cognitive Science, 1985, 9, 309-339. Toulmin , S., Rieke, R., and Janik, A. An introduction to reasoning (2nd Edition). NY
The Influence of Teachers' Schema in Teaching Reading on Students' Understanding
ERIC Educational Resources Information Center
Basmalah, Putri
2013-01-01
This paper tells about teachers' schema in teaching reading. Based on some articles that the writer given, there are teachers who success in teaching reading and who are failed. The one of the cause why they are failed is because they did not apply the complete activities (pre-reading activities, while-reading and post-reading) in teaching…
1980-02-01
ADOAA82 342 OKLAHOMA UNIV NORMAN COLL OF EDUCATION F/B 5/9 TASK ANALYSIS SCHEMA BASED ON COGNITIVE STYLE AND SUPPLANFATION--ETC(U) FEB GO F B AUSBURN...separately- perceived fragments) 6. Tasks requiring use of a. Visual/haptic (pre- kinesthetic or tactile ference for kinesthetic stimuli stimuli; ability...to transform kinesthetic stimuli into visual images; ability to learn directly from tactile or kinesthet - ic impressions) b. Field independence/de
Applications of Derandomization Theory in Coding
NASA Astrophysics Data System (ADS)
Cheraghchi, Mahdi
2011-07-01
Randomized techniques play a fundamental role in theoretical computer science and discrete mathematics, in particular for the design of efficient algorithms and construction of combinatorial objects. The basic goal in derandomization theory is to eliminate or reduce the need for randomness in such randomized constructions. In this thesis, we explore some applications of the fundamental notions in derandomization theory to problems outside the core of theoretical computer science, and in particular, certain problems related to coding theory. First, we consider the wiretap channel problem which involves a communication system in which an intruder can eavesdrop a limited portion of the transmissions, and construct efficient and information-theoretically optimal communication protocols for this model. Then we consider the combinatorial group testing problem. In this classical problem, one aims to determine a set of defective items within a large population by asking a number of queries, where each query reveals whether a defective item is present within a specified group of items. We use randomness condensers to explicitly construct optimal, or nearly optimal, group testing schemes for a setting where the query outcomes can be highly unreliable, as well as the threshold model where a query returns positive if the number of defectives pass a certain threshold. Finally, we design ensembles of error-correcting codes that achieve the information-theoretic capacity of a large class of communication channels, and then use the obtained ensembles for construction of explicit capacity achieving codes. [This is a shortened version of the actual abstract in the thesis.
Motor-based bodily self is selectively impaired in eating disorders.
Campione, Giovanna Cristina; Mansi, Gianluigi; Fumagalli, Alessandra; Fumagalli, Beatrice; Sottocornola, Simona; Molteni, Massimo; Micali, Nadia
2017-01-01
Body representation disturbances in body schema (i.e. unconscious sensorimotor body representations for action) have been frequently reported in eating disorders. Recently, it has been proposed that body schema relies on adequate functioning of the motor system, which is strongly implicated in discriminating between one's own and someone else's body. The present study aimed to investigate the motor-based bodily self in eating disorders and controls, in order to examine the role of the motor system in body representation disturbances at the body schema level. Female outpatients diagnosed with eating disorders (N = 15), and healthy controls (N = 18) underwent a hand laterality task, in which their own (self-stimuli) and someone else's hands (other-stimuli) were displayed at different orientations. Participants had to mentally rotate their own hand in order to provide a laterality judgement. Group differences in motor-based bodily self-recognition-i.e. whether a general advantage occurred when implicitly processing self- vs. other-stimuli - were evaluated, by analyzing response times and accuracy by means of mixed ANOVAs. Patients with eating disorders did not show a temporal advantage when mentally rotating self-stimuli compared to other-stimuli, as opposed to controls (F(1, 31) = 5.6, p = 0.02; eating disorders-other = 1092 ±256 msec, eating disorders-self = 1097±254 msec; healthy controls-other = 1239±233 msec, healthy controls -self = 1192±232 msec). This study provides initial indication that high-level motor functions might be compromised as part of body schema disturbances in eating disorders. Further larger investigations are required to test motor system abnormalities in the context of body schema disturbance in eating disorders.
Early maladaptive schemas in adult patients with attention deficit hyperactivity disorder.
Philipsen, Alexandra; Lam, Alexandra P; Breit, Sigrid; Lücke, Caroline; Müller, Helge H; Matthies, Swantje
2017-06-01
The main purpose of this study was to examine whether adult patients with attention deficit hyperactivity disorder (ADHD) demonstrate sets of dysfunctional cognitive beliefs and behavioural tendencies according to Jeffrey Young's schema-focused therapy model. Sets of dysfunctional beliefs (maladaptive schemas) were assessed with the Young Schema Questionnaire (YSQ-S2) in 78 adult ADHD patients and 80 control subjects. Patients with ADHD scored significantly higher than the control group on almost all maladaptive schemas. The 'Failure', 'Defectiveness/Shame', 'Subjugation' and 'Emotional Deprivation' schemas were most pronounced in adult ADHD patients, while only 'Vulnerability to Harm or Illness' did not differ between the two groups. The schemas which were most pronounced in adult patients with ADHD correspond well with their learning histories and core symptoms. By demonstrating the existence of early maladaptive schemas in adults suffering from ADHD, this study suggests that schema theory may usefully be applied to adult ADHD therapy.
Mobile medical visual information retrieval.
Depeursinge, Adrien; Duc, Samuel; Eggel, Ivan; Müller, Henning
2012-01-01
In this paper, we propose mobile access to peer-reviewed medical information based on textual search and content-based visual image retrieval. Web-based interfaces designed for limited screen space were developed to query via web services a medical information retrieval engine optimizing the amount of data to be transferred in wireless form. Visual and textual retrieval engines with state-of-the-art performance were integrated. Results obtained show a good usability of the software. Future use in clinical environments has the potential of increasing quality of patient care through bedside access to the medical literature in context.
DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases
NASA Astrophysics Data System (ADS)
Bröcheler, Matthias; Pugliese, Andrea; Subrahmanian, V. S.
RDF is an increasingly important paradigm for the representation of information on the Web. As RDF databases increase in size to approach tens of millions of triples, and as sophisticated graph matching queries expressible in languages like SPARQL become increasingly important, scalability becomes an issue. To date, there is no graph-based indexing method for RDF data where the index was designed in a way that makes it disk-resident. There is therefore a growing need for indexes that can operate efficiently when the index itself resides on disk. In this paper, we first propose the DOGMA index for fast subgraph matching on disk and then develop a basic algorithm to answer queries over this index. This algorithm is then significantly sped up via an optimized algorithm that uses efficient (but correct) pruning strategies when combined with two different extensions of the index. We have implemented a preliminary system and tested it against four existing RDF database systems developed by others. Our experiments show that our algorithm performs very well compared to these systems, with orders of magnitude improvements for complex graph queries.
Targeted exploration and analysis of large cross-platform human transcriptomic compendia
Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.
2016-01-01
We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801
Personalized query suggestion based on user behavior
NASA Astrophysics Data System (ADS)
Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui
Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.
Simpson, Susan G.; Pietrabissa, Giada; Rossi, Alessandro; Seychell, Tahnee; Manzoni, Gian Mauro; Munro, Calum; Nesci, Julian B.; Castelnuovo, Gianluca
2018-01-01
Objective: The aim of this study was to examine the psychometric properties and factorial structure of the Schema Mode Inventory for Eating Disorders (SMI-ED) in a disordered eating population. Method: 573 participants with disordered eating patterns as measured by the Eating Disorder Examination Questionnaire (EDE-Q) completed the 190-item adapted version of the Schema Mode Inventory (SMI). The new SMI-ED was developed by clinicians/researchers specializing in the treatment of eating disorders, through combining items from the original SMI with a set of additional questions specifically representative of the eating disorder population. Psychometric testing included Confirmatory Factor Analysis (CFA) and internal consistency (Cronbach's α). Multivariate Analyses of Covariance (MANCOVA) was also run to test statistical differences between the EDE-Q subscales on the SMI-ED modes, while controlling for possible confounding variables. Results: Factorial analysis confirmed an acceptable 16-related-factors solution for the SMI-ED, thus providing preliminary evidence for the adequate validity of the new measure based on internal structure. Concurrent validity was also established through moderate to high correlations on the modes most relevant to eating disorders with EDE-Q subscales. This study represents the first step in creating a psychometrically sound instrument for measuring schema modes in eating disorders, and provides greater insight into the relevant schema modes within this population. Conclusion: This research represents an important preliminary step toward understanding and labeling the schema mode model for this clinical group. Findings from the psychometric evaluation of SMI-ED suggest that this is a useful tool which may further assist in the measurement and conceptualization of schema modes in this population. PMID:29740379
Simpson, Susan G; Pietrabissa, Giada; Rossi, Alessandro; Seychell, Tahnee; Manzoni, Gian Mauro; Munro, Calum; Nesci, Julian B; Castelnuovo, Gianluca
2018-01-01
Objective: The aim of this study was to examine the psychometric properties and factorial structure of the Schema Mode Inventory for Eating Disorders (SMI-ED) in a disordered eating population. Method: 573 participants with disordered eating patterns as measured by the Eating Disorder Examination Questionnaire (EDE-Q) completed the 190-item adapted version of the Schema Mode Inventory (SMI). The new SMI-ED was developed by clinicians/researchers specializing in the treatment of eating disorders, through combining items from the original SMI with a set of additional questions specifically representative of the eating disorder population. Psychometric testing included Confirmatory Factor Analysis (CFA) and internal consistency (Cronbach's α). Multivariate Analyses of Covariance (MANCOVA) was also run to test statistical differences between the EDE-Q subscales on the SMI-ED modes, while controlling for possible confounding variables. Results: Factorial analysis confirmed an acceptable 16-related-factors solution for the SMI-ED, thus providing preliminary evidence for the adequate validity of the new measure based on internal structure. Concurrent validity was also established through moderate to high correlations on the modes most relevant to eating disorders with EDE-Q subscales. This study represents the first step in creating a psychometrically sound instrument for measuring schema modes in eating disorders, and provides greater insight into the relevant schema modes within this population. Conclusion: This research represents an important preliminary step toward understanding and labeling the schema mode model for this clinical group. Findings from the psychometric evaluation of SMI-ED suggest that this is a useful tool which may further assist in the measurement and conceptualization of schema modes in this population.
A Natural Language Interface Concordant with a Knowledge Base.
Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young
2016-01-01
The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.
Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen
2017-09-25
One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †
Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen
2017-01-01
One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679
Luo, Yuan; Szolovits, Peter
2016-01-01
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.
Luo, Yuan; Szolovits, Peter
2016-01-01
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379
An Energy-Efficient Approach to Enhance Virtual Sensors Provisioning in Sensor Clouds Environments
Filho, Raimir Holanda; Rabêlo, Ricardo de Andrade L.; de Carvalho, Carlos Giovanni N.; Mendes, Douglas Lopes de S.; Costa, Valney da Gama
2018-01-01
Virtual sensors provisioning is a central issue for sensors cloud middleware since it is responsible for selecting physical nodes, usually from Wireless Sensor Networks (WSN) of different owners, to handle user’s queries or applications. Recent works perform provisioning by clustering sensor nodes based on the correlation measurements and then selecting as few nodes as possible to preserve WSN energy. However, such works consider only homogeneous nodes (same set of sensors). Therefore, those works are not entirely appropriate for sensor clouds, which in most cases comprises heterogeneous sensor nodes. In this paper, we propose ACxSIMv2, an approach to enhance the provisioning task by considering heterogeneous environments. Two main algorithms form ACxSIMv2. The first one, ACASIMv1, creates multi-dimensional clusters of sensor nodes, taking into account the measurements correlations instead of the physical distance between nodes like most works on literature. Then, the second algorithm, ACOSIMv2, based on an Ant Colony Optimization system, selects an optimal set of sensors nodes from to respond user’s queries while attending all parameters and preserving the overall energy consumption. Results from initial experiments show that the approach reduces significantly the sensor cloud energy consumption compared to traditional works, providing a solution to be considered in sensor cloud scenarios. PMID:29495406
An Energy-Efficient Approach to Enhance Virtual Sensors Provisioning in Sensor Clouds Environments.
Lemos, Marcus Vinícius de S; Filho, Raimir Holanda; Rabêlo, Ricardo de Andrade L; de Carvalho, Carlos Giovanni N; Mendes, Douglas Lopes de S; Costa, Valney da Gama
2018-02-26
Virtual sensors provisioning is a central issue for sensors cloud middleware since it is responsible for selecting physical nodes, usually from Wireless Sensor Networks (WSN) of different owners, to handle user's queries or applications. Recent works perform provisioning by clustering sensor nodes based on the correlation measurements and then selecting as few nodes as possible to preserve WSN energy. However, such works consider only homogeneous nodes (same set of sensors). Therefore, those works are not entirely appropriate for sensor clouds, which in most cases comprises heterogeneous sensor nodes. In this paper, we propose ACxSIMv2, an approach to enhance the provisioning task by considering heterogeneous environments. Two main algorithms form ACxSIMv2. The first one, ACASIMv1, creates multi-dimensional clusters of sensor nodes, taking into account the measurements correlations instead of the physical distance between nodes like most works on literature. Then, the second algorithm, ACOSIMv2, based on an Ant Colony Optimization system, selects an optimal set of sensors nodes from to respond user's queries while attending all parameters and preserving the overall energy consumption. Results from initial experiments show that the approach reduces significantly the sensor cloud energy consumption compared to traditional works, providing a solution to be considered in sensor cloud scenarios.
Characterizing Thematized Derivative Schema by the Underlying Emergent Structures
ERIC Educational Resources Information Center
Garcia, Mercedes; Llinares, Salvador; Sanchez-Matamoros, Gloria
2011-01-01
This paper reports on different underlying structures of the derivative schema of three undergraduate students that were considered to be at the trans level of development of the derivative schema (action-process-object-schema). The derivative schema is characterized in terms of the students' ability to explicitly transfer the relationship between…
The role of economics in the QUERI program: QUERI Series
Smith, Mark W; Barnett, Paul G
2008-01-01
Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199
The role of economics in the QUERI program: QUERI Series.
Smith, Mark W; Barnett, Paul G
2008-04-22
The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.
Query Health: standards-based, cross-platform population health surveillance
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371
Query Health: standards-based, cross-platform population health surveillance.
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
LETTER TO THE EDITOR: Optimization of partial search
NASA Astrophysics Data System (ADS)
Korepin, Vladimir E.
2005-11-01
A quantum Grover search algorithm can find a target item in a database faster than any classical algorithm. One can trade accuracy for speed and find a part of the database (a block) containing the target item even faster; this is partial search. A partial search algorithm was recently suggested by Grover and Radhakrishnan. Here we optimize it. Efficiency of the search algorithm is measured by the number of queries to the oracle. The author suggests a new version of the Grover-Radhakrishnan algorithm which uses a minimal number of such queries. The algorithm can run on the same hardware that is used for the usual Grover algorithm.
NASA Astrophysics Data System (ADS)
Delgado, Francisco; Saha, Abhijit; Chandrasekharan, Srinivasan; Cook, Kem; Petry, Catherine; Ridgway, Stephen
2014-08-01
The Operations Simulator for the Large Synoptic Survey Telescope (LSST; http://www.lsst.org) allows the planning of LSST observations that obey explicit science driven observing specifications, patterns, schema, and priorities, while optimizing against the constraints placed by design-specific opto-mechanical system performance of the telescope facility, site specific conditions as well as additional scheduled and unscheduled downtime. It has a detailed model to simulate the external conditions with real weather history data from the site, a fully parameterized kinematic model for the internal conditions of the telescope, camera and dome, and serves as a prototype for an automatic scheduler for the real time survey operations with LSST. The Simulator is a critical tool that has been key since very early in the project, to help validate the design parameters of the observatory against the science requirements and the goals from specific science programs. A simulation run records the characteristics of all observations (e.g., epoch, sky position, seeing, sky brightness) in a MySQL database, which can be queried for any desired purpose. Derivative information digests of the observing history are made with an analysis package called Simulation Survey Tools for Analysis and Reporting (SSTAR). Merit functions and metrics have been designed to examine how suitable a specific simulation run is for several different science applications. Software to efficiently compare the efficacy of different survey strategies for a wide variety of science applications using such a growing set of metrics is under development. A recent restructuring of the code allows us to a) use "look-ahead" strategies that avoid cadence sequences that cannot be completed due to observing constraints; and b) examine alternate optimization strategies, so that the most efficient scheduling algorithm(s) can be identified and used: even few-percent efficiency gains will create substantive scientific opportunity. The enhanced simulator is being used to assess the feasibility of desired observing cadences, study the impact of changing science program priorities and assist with performance margin investigations of the LSST system.
Driver head pose tracking with thermal camera
NASA Astrophysics Data System (ADS)
Bole, S.; Fournier, C.; Lavergne, C.; Druart, G.; Lépine, T.
2016-09-01
Head pose can be seen as a coarse estimation of gaze direction. In automotive industry, knowledge about gaze direction could optimize Human-Machine Interface (HMI) and Advanced Driver Assistance Systems (ADAS). Pose estimation systems are often based on camera when applications have to be contactless. In this paper, we explore uncooled thermal imagery (8-14μm) for its intrinsic night vision capabilities and for its invariance versus lighting variations. Two methods are implemented and compared, both are aided by a 3D model of the head. The 3D model, mapped with thermal texture, allows to synthesize a base of 2D projected models, differently oriented and labeled in yaw and pitch. The first method is based on keypoints. Keypoints of models are matched with those of the query image. These sets of matchings, aided with the 3D shape of the model, allow to estimate 3D pose. The second method is a global appearance approach. Among all 2D models of the base, algorithm searches the one which is the closest to the query image thanks to a weighted least squares difference.
Artemis: Integrating Scientific Data on the Grid (Preprint)
2004-07-01
Theseus execution engine [Barish and Knoblock 03] to efficiently execute the generated datalog program. The Theseus execution engine has a wide...variety of operations to query databases, web sources, and web services. Theseus also contains a wide variety of relational operations, such as...selection, union, or projection. Furthermore, Theseus optimizes the execution of an integration plan by querying several data sources in parallel and
The missing link? Testing a schema account of unitization.
Tibon, Roni; Greve, Andrea; Henson, Richard
2018-05-09
Unitization refers to the creation of a new unit from previously distinct items. The concept of unitization has been used to explain how novel pairings between items can be remembered without requiring recollection, by virtue of new, item-like representations that enable familiarity-based retrieval. We tested an alternative account of unitization - a schema account - which suggests that associations between items can be rapidly assimilated into a schema. We used a common operationalization of "unitization" as the difference between two unrelated words being linked by a definition, relative to two words being linked by a sentence, during an initial study phase. During the following relearning phase, a studied word was re-paired with a new word, either related or unrelated to the original associate from study. In a final test phase, memory for the relearned associations was tested. We hypothesized that, if unitized representations act like schemas, then we would observe some generalization to related words, such that memory would be better in the definition than sentence condition for related words, but not for unrelated words. Contrary to the schema hypothesis, evidence favored the null hypothesis of no difference between definition and sentence conditions for related words (Experiment 1), even when each cue was associated with multiple associates, indicating that the associations can be generalized (Experiment 2), or when the schematic information was explicitly re-activated during Relearning (Experiment 3). These results suggest that unitized associations do not generalize to accommodate new information, and therefore provide evidence against the schema account.
A Semantic Analysis of XML Schema Matching for B2B Systems Integration
ERIC Educational Resources Information Center
Kim, Jaewook
2011-01-01
One of the most critical steps to integrating heterogeneous e-Business applications using different XML schemas is schema matching, which is known to be costly and error-prone. Many automatic schema matching approaches have been proposed, but the challenge is still daunting because of the complexity of schemas and immaturity of technologies in…
CytometryML: a markup language for analytical cytology
NASA Astrophysics Data System (ADS)
Leif, Robert C.; Leif, Stephanie H.; Leif, Suzanne B.
2003-06-01
Cytometry Markup Language, CytometryML, is a proposed new analytical cytology data standard. CytometryML is a set of XML schemas for encoding both flow cytometry and digital microscopy text based data types. CytometryML schemas reference both DICOM (Digital Imaging and Communications in Medicine) codes and FCS keywords. These schemas provide representations for the keywords in FCS 3.0 and will soon include DICOM microscopic image data. Flow Cytometry Standard (FCS) list-mode has been mapped to the DICOM Waveform Information Object. A preliminary version of a list mode binary data type, which does not presently exist in DICOM, has been designed. This binary type is required to enhance the storage and transmission of flow cytometry and digital microscopy data. Index files based on Waveform indices will be used to rapidly locate the cells present in individual subsets. DICOM has the advantage of employing standard file types, TIF and JPEG, for Digital Microscopy. Using an XML schema based representation means that standard commercial software packages such as Excel and MathCad can be used to analyze, display, and store analytical cytometry data. Furthermore, by providing one standard for both DICOM data and analytical cytology data, it eliminates the need to create and maintain special purpose interfaces for analytical cytology data thereby integrating the data into the larger DICOM and other clinical communities. A draft version of CytometryML is available at www.newportinstruments.com.
Schema-Based Analysis of Gendered Self-Disclosure in Persian: Writing for Dating Context
ERIC Educational Resources Information Center
Khodadady, Ebrahim; Mehr, Somayeh Javadi
2012-01-01
This paper reports a textual analysis of letters written by 21 male and 21 female participants in Persian. Each writer wrote two letters, one to a dating service and another one to a hypothetical person chosen and introduced by the center. Therefore, a total of 84 letters were collected from the participants. Schema theory was used to find the…
ERIC Educational Resources Information Center
Mudrikah, Achmad
2016-01-01
The research has shown a model of learning activities that can be used to stimulate reflective abstraction in students. Reflective abstraction as a method of constructing knowledge in the Action-Process-Object-Schema theory, and is expected to occur when students are in learning activities, will be able to encourage students to make the process of…
ERIC Educational Resources Information Center
Berger, Carole; Donnadieu, Sophie
2006-01-01
This research explores the way in which young children (5 years of age) and adults use perceptual and conceptual cues for categorizing objects processed by vision or by audition. Three experiments were carried out using forced-choice categorization tasks that allowed responses based on taxonomic relations (e.g., vehicles) or on schema category…
Visualizing whole-brain DTI tractography with GPU-based Tuboids and LoD management.
Petrovic, Vid; Fallon, James; Kuester, Falko
2007-01-01
Diffusion Tensor Imaging (DTI) of the human brain, coupled with tractography techniques, enable the extraction of large-collections of three-dimensional tract pathways per subject. These pathways and pathway bundles represent the connectivity between different brain regions and are critical for the understanding of brain related diseases. A flexible and efficient GPU-based rendering technique for DTI tractography data is presented that addresses common performance bottlenecks and image-quality issues, allowing interactive render rates to be achieved on commodity hardware. An occlusion query-based pathway LoD management system for streamlines/streamtubes/tuboids is introduced that optimizes input geometry, vertex processing, and fragment processing loads, and helps reduce overdraw. The tuboid, a fully-shaded streamtube impostor constructed entirely on the GPU from streamline vertices, is also introduced. Unlike full streamtubes and other impostor constructs, tuboids require little to no preprocessing or extra space over the original streamline data. The supported fragment processing levels of detail range from texture-based draft shading to full raycast normal computation, Phong shading, environment mapping, and curvature-correct text labeling. The presented text labeling technique for tuboids provides adaptive, aesthetically pleasing labels that appear attached to the surface of the tubes. Furthermore, an occlusion query aggregating and scheduling scheme for tuboids is described that reduces the query overhead. Results for a tractography dataset are presented, and demonstrate that LoD-managed tuboids offer benefits over traditional streamtubes both in performance and appearance.
Querying and Ranking XML Documents.
ERIC Educational Resources Information Center
Schlieder, Torsten; Meuss, Holger
2002-01-01
Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.
Jamil, Hasan M
2017-01-01
Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
de Haas, Billie; Hutter, Inge
2018-05-08
Teachers can feel uncomfortable teaching sexuality education when the content conflicts with their cultural values and beliefs. However, more research is required to understand how to resolve conflicts between teachers' values and beliefs and those implicit in comprehensive approaches to sexuality education. This study uses cultural schema theory to identify teachers' cultural schemas of teaching sexuality education and the internal conflicts arising between them. In-depth interviews were conducted with 40 secondary school teachers in Kampala, the capital city of Uganda. Embedded in a context of morality, conflicting cultural schemas of sexuality education and young people's sexual citizenship in traditional and present-day Ugandan society were found: young people are both innocent and sexually active; sexuality education both encourages and prevents sexual activity; and teachers need to teach sexuality education, but it is considered immoral for them to do so. In countries such as Uganda, supportive school regulations and a mandate from society could help teachers feel more comfortable adopting comprehensive approaches to sexuality education.
Stress affects the neural ensemble for integrating new information and prior knowledge.
Vogel, Susanne; Kluen, Lisa Marieke; Fernández, Guillén; Schwabe, Lars
2018-06-01
Prior knowledge, represented as a schema, facilitates memory encoding. This schema-related learning is assumed to rely on the medial prefrontal cortex (mPFC) that rapidly integrates new information into the schema, whereas schema-incongruent or novel information is encoded by the hippocampus. Stress is a powerful modulator of prefrontal and hippocampal functioning and first studies suggest a stress-induced deficit of schema-related learning. However, the underlying neural mechanism is currently unknown. To investigate the neural basis of a stress-induced schema-related learning impairment, participants first acquired a schema. One day later, they underwent a stress induction or a control procedure before learning schema-related and novel information in the MRI scanner. In line with previous studies, learning schema-related compared to novel information activated the mPFC, angular gyrus, and precuneus. Stress, however, affected the neural ensemble activated during learning. Whereas the control group distinguished between sets of brain regions for related and novel information, stressed individuals engaged the hippocampus even when a relevant schema was present. Additionally, stressed participants displayed aberrant functional connectivity between brain regions involved in schema processing when encoding novel information. The failure to segregate functional connectivity patterns depending on the presence of prior knowledge was linked to impaired performance after stress. Our results show that stress affects the neural ensemble underlying the efficient use of schemas during learning. These findings may have relevant implications for clinical and educational settings. Copyright © 2018 Elsevier Inc. All rights reserved.
Voulgarelis, Dimitrios; Velayudhan, Ajoy; Smith, Frank
2017-01-01
Agent-based models provide a formidable tool for exploring complex and emergent behaviour of biological systems as well as accurate results but with the drawback of needing a lot of computational power and time for subsequent analysis. On the other hand, equation-based models can more easily be used for complex analysis in a much shorter timescale. This paper formulates an ordinary differential equations and stochastic differential equations model to capture the behaviour of an existing agent-based model of tumour cell reprogramming and applies it to optimization of possible treatment as well as dosage sensitivity analysis. For certain values of the parameter space a close match between the equation-based and agent-based models is achieved. The need for division of labour between the two approaches is explored. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
York, Valerie K; Brannon, Laura A; Miller, Megan M
2012-01-01
We investigated whether a thoroughly personalized message (tailored to a person's "Big Five" personality traits) or a message matched to an alternate form of self-schema (ideal self-schema) would be more influential than a self-schema matched message (that has been found to be effective) at marketing responsible drinking. We expected the more thoroughly personalized Big Five matched message to be more effective than the self-schema matched message. However, neither the Big Five message nor the ideal self-schema message was more effective than the actual self-schema message. Therefore, research examining self-schema matching should be pursued rather than more complex Big Five matching.
Information Retrieval Using UMLS-based Structured Queries
Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith
2001-01-01
During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
Research on presentation and query service of geo-spatial data based on ontology
NASA Astrophysics Data System (ADS)
Li, Hong-wei; Li, Qin-chao; Cai, Chang
2008-10-01
The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.
Paek, Hye-Jin; Hove, Thomas
2018-05-01
This study examines the roles that the media effects and persuasion ethics schemas play in people's responses to an antismoking ad in South Korea. An online experiment was conducted with 347 adults. The media effects schema was manipulated with news stories on an antismoking campaign's effectiveness, while the persuasion ethics schema was measured and median-split. Analysis of Variance (ANOVA) tests were performed for issue attitudes (Iatt), attitude toward the ad (Aad), and behavioral intention (BI). Results show significant main effects of the media effects schema on the three dependent variables. People in the weak media effects condition had significantly lower Iatt, Aad, and BI than those in either the strong media effects condition or the control condition. This pattern was more pronounced among smokers. While there was no significant main effect of the persuasion ethics schema on any of the dependent variables, a significant interaction effect for persuasion ethics schema and smoking status was found on behavioral intention (BI). Nonsmokers' BI was significantly higher than smokers' in the low-persuasion ethics schema condition, but it was not significant in the high-persuasion ethics schema condition.
NASA Astrophysics Data System (ADS)
Gembong, S.; Suwarsono, S. T.; Prabowo
2018-03-01
Schema in the current study refers to a set of action, process, object and other schemas already possessed to build an individual’s ways of thinking to solve a given problem. The current study aims to investigate the schemas built among elementary school students in solving problems related to operations of addition to fractions. The analyses of the schema building were done qualitatively on the basis of the analytical framework of the APOS theory (Action, Process, Object, and Schema). Findings show that the schemas built on students of high and middle ability indicate the following. In the action stage, students were able to add two fractions by way of drawing a picture or procedural way. In the Stage of process, they could add two and three fractions. In the stage of object, they could explain the steps of adding two fractions and change a fraction into addition of fractions. In the last stage, schema, they could add fractions by relating them to another schema they have possessed i.e. the least common multiple. Those of high and middle mathematic abilities showed that their schema building in solving problems related to operations odd addition to fractions worked in line with the framework of the APOS theory. Those of low mathematic ability, however, showed that their schema on each stage did not work properly.
Rezaei, Mehdi; Ghazanfari, Firoozeh; Rezaee, Fatemeh
2016-12-30
The present investigation was designed to examine disconnection and rejection (DR) schemas, negative emotional schemas (NESs) and experimental avoidance (EA) as mediating variables of the relationship between the childhood trauma (CT) and depression. Specifically we examined the mediating role of NESs and EA between DR schemas and depression. The study sample consist of 439 female college students (M age =22.47; SD=6.0), of whom 88 met the criteria for current major depressive disorder (MDD) and 351 who had history of MDD in the last 12 months. Subjects were assessed by Structured Clinical Interview for DSM-IV (SCID) and completed the Childhood Trauma Questionnaire (CTQ), the Early Maladaptive Schemas Questionnaire (SQ-SF), the Leahy Emotional Schemas Scale (LESS), the Acceptance and Action Questionnaire (AAQ-II), and the Beck Depression Inventory-II (BDI-II). The findings showed that DR schemas were mediator of the relationship CT and depression but CT through the NESs and EA did not predict depression. NESs were mediator of the relationship between DR schemas and depression and EA was mediator of the relationship between DR schemas and depression. In general, results suggest that intervention of depressed women may need to target the changing of DR schemas, NESs and reduction of EA. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
InterMine Webservices for Phytozome (Rev2)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carlson, Joseph; Goodstein, David; Rokhsar, Dan
2014-07-10
A datawarehousing framework for information provides a useful infrastructure for providers and users of genomic data. For providers, the infrastructure give them a consistent mechanism for extracting raw data. While for the users, the web services supported by the software allows them to make complex, and often unique, queries of the data. Previously, phytozome.net used BioMart to provide the infrastructure. As the complexity, scale and diversity of the dataset as grown, we decided to implement an InterMine web service on our servers. This change was largely motivated by the ability to have a more complex table structure and richer webmore » reporting mechanism than BioMart. For InterMine to achieve its more complex database schema it requires an XML description of the data and an appropriate loader. Unlimited one-to-many and many-to-many relationship between the tables can be enabled in the schema. We have implemented support for:1.) Genomes and annotations for the data in Phytozome. This set is the 48 organisms currently stored in a back end CHADO datastore. The data loaders are modified versions of the CHADO data adapters from FlyMine. 2.) Interproscan results from all proteins in the Phytozome database. 3.) Clusters of proteins into a grouped heirarchically by similarity. 4.) Cufflinks results from tissue-specific RNA-Seq data of Phytozome organisms. 5.) Diversity data (GATK and SnpEFF results) from a set of individual organism. The last two datatypes are new in this implementation of our web services. We anticipate that the scale of these data will increase considerably in the near future.« less
Stanton, Amelia M.; Meston, Cindy M.
2017-01-01
Abstract This is the first study to examine language use and sexual self-schemas in natural language data extracted from posts to a large online forum. Recently, two studies applied advanced text analysis techniques to examine differences in language use and sexual self-schemas between women with and without a history of childhood sexual abuse. The aim of the current study was to test the ecological validity of the differences in language use and sexual self-schema themes that emerged between these two groups of women in the laboratory. Archival natural language data were extracted from a social media website and analyzed using LIWC2015, a computerized text analysis program, and other word counting approaches. The differences in both language use and sexual self-schema themes that manifested in recent laboratory research were replicated and validated in the large online sample. To our knowledge, these results provide the first empirical examination of sexual cognitions as they occur in the real world. These results also suggest that natural language analysis of text extracted from social media sites may be a potentially viable precursor or alternative to laboratory measurement of sexual trauma phenomena, as well as clinical phenomena, more generally. PMID:28570129
Louis, John P; Wood, Alex M; Lockwood, George; Ho, Moon-Ho Ringo; Ferguson, Eamonn
2018-04-19
Negative schemas have been widely recognized as being linked to psychopathology and mental health, and they are central to the Schema Therapy (ST) model. This study is the first to report on the psychometric properties of the Young Positive Schema Questionnaire (YPSQ). In a combined community sample (Manila, Philippines, n = 559; Bangalore, India, n = 350; Singapore, n = 628), we identified a 56-item, 14-factor solution for the YPSQ. Multigroup confirmatory factor analysis supported the 14-factor model using data from two other independent samples: an Eastern sample from Kuala Lumpur, Malaysia (n = 229) and a Western sample from the United States (n = 214). Construct validity was demonstrated with the Young Schema Questionnaire 3 Short Form (YSQ-S3) that measures negative schemas, and divergent validity was demonstrated for 11 of the YPSQ subscales with their respective negative schema counterparts. Convergent validity of the 14 subscales of YPSQ was demonstrated with measures of personality dispositions, emotional distress, well-being, trait gratitude, and humor styles. Positive schemas also showed incremental validity over and above negative schemas for these same measures, thus demonstrating that both positive and negative schemas are separate constructs that relate in unique ways to mental health. Implications for using both the YPSQ and the YSQ-S3 scales in tandem in ST as well as cultural nuances from the use of Asian samples were discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Active learning based segmentation of Crohns disease from abdominal MRI.
Mahapatra, Dwarikanath; Vos, Franciscus M; Buhmann, Joachim M
2016-05-01
This paper proposes a novel active learning (AL) framework, and combines it with semi supervised learning (SSL) for segmenting Crohns disease (CD) tissues from abdominal magnetic resonance (MR) images. Robust fully supervised learning (FSL) based classifiers require lots of labeled data of different disease severities. Obtaining such data is time consuming and requires considerable expertise. SSL methods use a few labeled samples, and leverage the information from many unlabeled samples to train an accurate classifier. AL queries labels of most informative samples and maximizes gain from the labeling effort. Our primary contribution is in designing a query strategy that combines novel context information with classification uncertainty and feature similarity. Combining SSL and AL gives a robust segmentation method that: (1) optimally uses few labeled samples and many unlabeled samples; and (2) requires lower training time. Experimental results show our method achieves higher segmentation accuracy than FSL methods with fewer samples and reduced training effort. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge
The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.
CE-SAM: a conversational interface for ISR mission support
NASA Astrophysics Data System (ADS)
Pizzocaro, Diego; Parizas, Christos; Preece, Alun; Braines, Dave; Mott, David; Bakdash, Jonathan Z.
2013-05-01
There is considerable interest in natural language conversational interfaces. These allow for complex user interactions with systems, such as fulfilling information requirements in dynamic environments, without requiring extensive training or a technical background (e.g. in formal query languages or schemas). To leverage the advantages of conversational interactions we propose CE-SAM (Controlled English Sensor Assignment to Missions), a system that guides users through refining and satisfying their information needs in the context of Intelligence, Surveillance, and Reconnaissance (ISR) operations. The rapidly-increasing availability of sensing assets and other information sources poses substantial challenges to effective ISR resource management. In a coalition context, the problem is even more complex, because assets may be "owned" by different partners. We show how CE-SAM allows a user to refine and relate their ISR information needs to pre-existing concepts in an ISR knowledge base, via conversational interaction implemented on a tablet device. The knowledge base is represented using Controlled English (CE) - a form of controlled natural language that is both human-readable and machine processable (i.e. can be used to implement automated reasoning). Users interact with the CE-SAM conversational interface using natural language, which the system converts to CE for feeding-back to the user for confirmation (e.g. to reduce misunderstanding). We show that this process not only allows users to access the assets that can support their mission needs, but also assists them in extending the CE knowledge base with new concepts.
A Queueing Approach to Optimal Resource Replication in Wireless Sensor Networks
2009-04-29
network (an energy- centric approach) or to ensure the proportion of query failures does not exceed a predetermined threshold (a failure- centric ...replication strategies in wireless sensor networks. The model can be used to minimize either the total transmission rate of the network (an energy- centric ...approach) or to ensure the proportion of query failures does not exceed a predetermined threshold (a failure- centric approach). The model explicitly
Optimization of Extended Relational Database Systems
1986-07-23
control functions are integrated into a single system in a homogeneoua way. As a first exam - ple, consider previous work in supporting various semantic...sizes are reduced and, wnk? quently, the number of materializations that will be needed is aba lower. For exam - pie, in the above query tuple...retrieve (EMP.name) where EMP hobbies instrument = ’ violin ’ When the various entries in the hobbies field are materialized, only those queries that
Query Language for Location-Based Services: A Model Checking Approach
NASA Astrophysics Data System (ADS)
Hoareau, Christian; Satoh, Ichiro
We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.
Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog.
Weskamp, Nils
2016-07-01
Substructure search (SSS) is a fundamental technique supported by various chemical information systems. Many users apply it in an iterative manner: they modify their queries to shape the composition of the retrieved hit sets according to their needs. We propose and evaluate two heuristic extensions of SSS aimed at simplifying these iterative query modifications by collecting additional information during query processing and visualizing this information in an intuitive way. This gives the user a convenient feedback on how certain changes to the query would affect the retrieved hit set and reduces the number of trial-and-error cycles needed to generate an optimal search result. The proposed heuristics are simple, yet surprisingly effective and can be easily added to existing SSS implementations. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Neural correlates of post-conventional moral reasoning: a voxel-based morphometry study.
Prehn, Kristin; Korczykowski, Marc; Rao, Hengyi; Fang, Zhuo; Detre, John A; Robertson, Diana C
2015-01-01
Going back to Kohlberg, moral development research affirms that people progress through different stages of moral reasoning as cognitive abilities mature. Individuals at a lower level of moral reasoning judge moral issues mainly based on self-interest (personal interests schema) or based on adherence to laws and rules (maintaining norms schema), whereas individuals at the post-conventional level judge moral issues based on deeper principles and shared ideals. However, the extent to which moral development is reflected in structural brain architecture remains unknown. To investigate this question, we used voxel-based morphometry and examined the brain structure in a sample of 67 Master of Business Administration (MBA) students. Subjects completed the Defining Issues Test (DIT-2) which measures moral development in terms of cognitive schema preference. Results demonstrate that subjects at the post-conventional level of moral reasoning were characterized by increased gray matter volume in the ventromedial prefrontal cortex and subgenual anterior cingulate cortex, compared with subjects at a lower level of moral reasoning. Our findings support an important role for both cognitive and emotional processes in moral reasoning and provide first evidence for individual differences in brain structure according to the stages of moral reasoning first proposed by Kohlberg decades ago.
Stress leads to aberrant hippocampal involvement when processing schema-related information.
Vogel, Susanne; Kluen, Lisa Marieke; Fernández, Guillén; Schwabe, Lars
2018-01-01
Prior knowledge, represented as a mental schema, has critical impact on how we organize, interpret, and process incoming information. Recent findings indicate that the use of an existing schema is coordinated by the medial prefrontal cortex (mPFC), communicating with parietal areas. The hippocampus, however, is crucial for encoding schema-unrelated information but not for schema-related information. A recent study indicated that stress mediators may affect schema-related memory, but the underlying neural mechanisms are currently unknown. Here, we thus tested the impact of acute stress on neural processing of schema-related information. We exposed healthy participants to a stress or control manipulation before they processed, in the MRI scanner, words related or unrelated to a preexisting schema activated by a specific cue. Participants' memory for the presented material was tested 3-5 d after encoding. Overall, the processing of schema-related information activated the mPFC, the precuneus, and the angular gyrus. Stress resulted in aberrant hippocampal activity and connectivity while participants processed schema-related information. This aberrant engagement of the hippocampus was linked to altered subsequent memory. These findings suggest that stress may interfere with the efficient use of prior knowledge during encoding and may have important practical implications, in particular for educational settings. © 2018 Vogel et al.; Published by Cold Spring Harbor Laboratory Press.
Hybrid Quantum-Classical Approach to Quantum Optimal Control.
Li, Jun; Yang, Xiaodong; Peng, Xinhua; Sun, Chang-Pu
2017-04-14
A central challenge in quantum computing is to identify more computational problems for which utilization of quantum resources can offer significant speedup. Here, we propose a hybrid quantum-classical scheme to tackle the quantum optimal control problem. We show that the most computationally demanding part of gradient-based algorithms, namely, computing the fitness function and its gradient for a control input, can be accomplished by the process of evolution and measurement on a quantum simulator. By posing queries to and receiving answers from the quantum simulator, classical computing devices update the control parameters until an optimal control solution is found. To demonstrate the quantum-classical scheme in experiment, we use a seven-qubit nuclear magnetic resonance system, on which we have succeeded in optimizing state preparation without involving classical computation of the large Hilbert space evolution.
O'Leary, Kevin J; Devisetty, Vikram K; Patel, Amitkumar R; Malkenson, David; Sama, Pradeep; Thompson, William K; Landler, Matthew P; Barnard, Cynthia; Williams, Mark V
2013-02-01
Research supports medical record review using screening triggers as the optimal method to detect hospital adverse events (AE), yet the method is labour-intensive. This study compared a traditional trigger tool with an enterprise data warehouse (EDW) based screening method to detect AEs. We created 51 automated queries based on 33 traditional triggers from prior research, and then applied them to 250 randomly selected medical patients hospitalised between 1 September 2009 and 31 August 2010. Two physicians each abstracted records from half the patients using a traditional trigger tool and then performed targeted abstractions for patients with positive EDW queries in the complementary half of the sample. A third physician confirmed presence of AEs and assessed preventability and severity. Traditional trigger tool and EDW based screening identified 54 (22%) and 53 (21%) patients with one or more AE. Overall, 140 (56%) patients had one or more positive EDW screens (total 366 positive screens). Of the 137 AEs detected by at least one method, 86 (63%) were detected by a traditional trigger tool, 97 (71%) by EDW based screening and 46 (34%) by both methods. Of the 11 total preventable AEs, 6 (55%) were detected by traditional trigger tool, 7 (64%) by EDW based screening and 2 (18%) by both methods. Of the 43 total serious AEs, 28 (65%) were detected by traditional trigger tool, 29 (67%) by EDW based screening and 14 (33%) by both. We found relatively poor agreement between traditional trigger tool and EDW based screening with only approximately a third of all AEs detected by both methods. A combination of complementary methods is the optimal approach to detecting AEs among hospitalised patients.
KAYA TEZEL, Fulya; TUTAREL KIŞLAK, Şennur; BOYSAN, Murat
2015-01-01
Introduction Cognitive theories of psychopathology have generally proposed that early experiences of childhood abuse and neglect may result in the development of early maladaptive self-schemas. Maladaptive core schemas are central in the development and maintenance of psychological symptoms in a schema-focused approach. Psychosocial dysfunction in individuals with psychological problems has been consistently found to be associated with symptom severity. However, till date, linkages between psychosocial functioning, early traumatic experiences and core schemas have received little attention. The aim of the present study was to explore the relations among maladaptive interpersonal styles, negative experiences in childhood and core self-schemas in non-clinical adults. Methods A total of 300 adults (58% women) participated in the study. The participants completed a socio-demographic questionnaire, Young Schema Questionnaire, Childhood Trauma Questionnaire and Interpersonal Style Scale. Results Hierarchical regression analyses revealed that the Disconnection and Rejection and Impaired Limits schema domains were significant antecedents of maladaptive interpersonal styles after controlling for demographic characteristics and childhood abuse and neglect. Associations of child sexual abuse with Emotionally Avoidant, Manipulative and Abusive interpersonal styles were mediated by early maladaptive schemas. Early maladaptive schemas mediated the relations of emotional abuse with Emotionally Avoidant and Avoidant interpersonal styles as well as the relations of physical abuse with Avoidant and Abusive interpersonal styles. Conclusion Interpersonal styles in adulthood are significantly associated with childhood traumatic experiences. Significant relations between early traumatic experiences and maladaptive interpersonal styles are mediated by early maladaptive schemas. PMID:28360715
Kaya Tezel, Fulya; Tutarel Kişlak, Şennur; Boysan, Murat
2015-09-01
Cognitive theories of psychopathology have generally proposed that early experiences of childhood abuse and neglect may result in the development of early maladaptive self-schemas. Maladaptive core schemas are central in the development and maintenance of psychological symptoms in a schema-focused approach. Psychosocial dysfunction in individuals with psychological problems has been consistently found to be associated with symptom severity. However, till date, linkages between psychosocial functioning, early traumatic experiences and core schemas have received little attention. The aim of the present study was to explore the relations among maladaptive interpersonal styles, negative experiences in childhood and core self-schemas in non-clinical adults. A total of 300 adults (58% women) participated in the study. The participants completed a socio-demographic questionnaire, Young Schema Questionnaire, Childhood Trauma Questionnaire and Interpersonal Style Scale. Hierarchical regression analyses revealed that the Disconnection and Rejection and Impaired Limits schema domains were significant antecedents of maladaptive interpersonal styles after controlling for demographic characteristics and childhood abuse and neglect. Associations of child sexual abuse with Emotionally Avoidant, Manipulative and Abusive interpersonal styles were mediated by early maladaptive schemas. Early maladaptive schemas mediated the relations of emotional abuse with Emotionally Avoidant and Avoidant interpersonal styles as well as the relations of physical abuse with Avoidant and Abusive interpersonal styles. Interpersonal styles in adulthood are significantly associated with childhood traumatic experiences. Significant relations between early traumatic experiences and maladaptive interpersonal styles are mediated by early maladaptive schemas.
Benchmarking infrastructure for mutation text mining
2014-01-01
Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.
Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo
2014-02-25
Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera.
Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun
2015-08-31
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera
Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun
2015-01-01
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments. PMID:26404284
Conceptual Developments in Schema Theory.
ERIC Educational Resources Information Center
Bigenho, Frederick W., Jr.
The conceptual development of schema theory, the way an individual organizes knowledge, is discussed, reviewing a range of perspectives regarding schema. Schema has been defined as the interfacing of incoming information with prior knowledge, clustered in networks. These networks comprise a superordinate concept and supporting information. The…
Optimal image alignment with random projections of manifolds: algorithm and geometric analysis.
Kokiopoulou, Effrosyni; Kressner, Daniel; Frossard, Pascal
2011-06-01
This paper addresses the problem of image alignment based on random measurements. Image alignment consists of estimating the relative transformation between a query image and a reference image. We consider the specific problem where the query image is provided in compressed form in terms of linear measurements captured by a vision sensor. We cast the alignment problem as a manifold distance minimization problem in the linear subspace defined by the measurements. The transformation manifold that represents synthesis of shift, rotation, and isotropic scaling of the reference image can be given in closed form when the reference pattern is sparsely represented over a parametric dictionary. We show that the objective function can then be decomposed as the difference of two convex functions (DC) in the particular case where the dictionary is built on Gaussian functions. Thus, the optimization problem becomes a DC program, which in turn can be solved globally by a cutting plane method. The quality of the solution is typically affected by the number of random measurements and the condition number of the manifold that describes the transformations of the reference image. We show that the curvature, which is closely related to the condition number, remains bounded in our image alignment problem, which means that the relative transformation between two images can be determined optimally in a reduced subspace.
Big Geo Data Services: From More Bytes to More Barrels
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Baumann, Peter
2016-04-01
The data deluge is affecting the oil and gas industry just as much as many other industries. However, aside from the sheer volume there is the challenge of data variety, such as regular and irregular grids, multi-dimensional space/time grids, point clouds, and TINs and other meshes. A uniform conceptualization for modelling and serving them could save substantial effort, such as the proverbial "department of reformatting". The notion of a coverage actually can accomplish this. Its abstract model in ISO 19123 together with the concrete, interoperable OGC Coverage Implementation Schema (CIS), which is currently under adoption as ISO 19123-2, provieds a common platform for representing any n-D grid type, point clouds, and general meshes. This is paired by the OGC Web Coverage Service (WCS) together with its datacube analytics language, the OGC Web Coverage Processing Service (WCPS). The OGC WCS Core Reference Implementation, rasdaman, relies on Array Database technology, i.e. a NewSQL/NoSQL approach. It supports the grid part of coverages, with installations of 100+ TB known and single queries parallelized across 1,000+ cloud nodes. Recent research attempts to address the point cloud and mesh part through a unified query model. The Holy Grail envisioned is that these approaches can be merged into a single service interface at some time. We present both grid amd point cloud / mesh approaches and discuss status, implementation, standardization, and research perspectives, including a live demo.
Design and Implementation of the CEBAF Element Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theodore Larrieu, Christopher Slominski, Michele Joyce
2011-10-01
With inauguration of the CEBAF Element Database (CED) in Fall 2010, Jefferson Lab computer scientists have taken a first step toward the eventual goal of a model-driven accelerator. Once fully populated, the database will be the primary repository of information used for everything from generating lattice decks to booting front-end computers to building controls screens. A particular requirement influencing the CED design is that it must provide consistent access to not only present, but also future, and eventually past, configurations of the CEBAF accelerator. To accomplish this, an introspective database schema was designed that allows new elements, element types, andmore » element properties to be defined on-the-fly without changing table structure. When used in conjunction with the Oracle Workspace Manager, it allows users to seamlessly query data from any time in the database history with the exact same tools as they use for querying the present configuration. Users can also check-out workspaces and use them as staging areas for upcoming machine configurations. All Access to the CED is through a well-documented API that is translated automatically from original C++ into native libraries for script languages such as perl, php, and TCL making access to the CED easy and ubiquitous. Notice: Authored by Jefferson Science Associates, LLC under U.S. DOE Contract No. DE-AC05-06OR23177. The U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce this manuscript for U.S. Government purposes.« less
Common Data Model for Neuroscience Data and Data Model Exchange
Gardner, Daniel; Knuth, Kevin H.; Abato, Michael; Erde, Steven M.; White, Thomas; DeBellis, Robert; Gardner, Esther P.
2001-01-01
Objective: Generalizing the data models underlying two prototype neurophysiology databases, the authors describe and propose the Common Data Model (CDM) as a framework for federating a broad spectrum of disparate neuroscience information resources. Design: Each component of the CDM derives from one of five superclasses—data, site, method, model, and reference—or from relations defined between them. A hierarchic attribute-value scheme for metadata enables interoperability with variable tree depth to serve specific intra- or broad inter-domain queries. To mediate data exchange between disparate systems, the authors propose a set of XML-derived schema for describing not only data sets but data models. These include biophysical description markup language (BDML), which mediates interoperability between data resources by providing a meta-description for the CDM. Results: The set of superclasses potentially spans data needs of contemporary neuroscience. Data elements abstracted from neurophysiology time series and histogram data represent data sets that differ in dimension and concordance. Site elements transcend neurons to describe subcellular compartments, circuits, regions, or slices; non-neuroanatomic sites include sequences to patients. Methods and models are highly domain-dependent. Conclusions: True federation of data resources requires explicit public description, in a metalanguage, of the contents, query methods, data formats, and data models of each data resource. Any data model that can be derived from the defined superclasses is potentially conformant and interoperability can be enabled by recognition of BDML-described compatibilities. Such metadescriptions can buffer technologic changes. PMID:11141510
Comprehensive Optimal Manpower and Personnel Analytic Simulation System (COMPASS)
2009-10-01
4 The EDB consists of 4 major components (some of which are re-usable): 1. Metadata Editor ( MDE ): Also considered a leaf node, the metadata...end-user queries via the QB. The EDB supports multiple instances of the MDE , although currently, only a single instance is recommended. 2 Query...the MSB is a central collection of web services, responsible for the authentication and authorization of users, maintenance of the EDB metadata
LSD: Large Survey Database framework
NASA Astrophysics Data System (ADS)
Juric, Mario
2012-09-01
The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures.
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
Cloud Computing and Its Applications in GIS
NASA Astrophysics Data System (ADS)
Kang, Cao
2011-12-01
Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Ritter, Alison; Hull, Philip; Berends, Lynda; Chalmers, Jenny; Lancaster, Kari
2016-09-01
The aim of this study was to establish a conceptual schema for government purchasing of alcohol and other drug treatment in Australia which could encompass the diversity and variety in purchasing arrangements, and facilitate better decision-maker by purchasers. There is a limited evidence base on purchasing arrangements in alcohol and drug treatment despite the clear impact of purchasing arrangements on both treatment processes and treatment outcomes. The relevant health and social welfare literature on purchasing arrangements was reviewed; data were collected from Australian purchasers and providers of treatment giving detailed descriptions of the array of purchasing arrangements. Combined analysis of the literature and the Australian purchasing data resulted in a draft schema which was then reviewed by an expert committee and subsequently finalised. The conceptual schema presented here was purpose-built for alcohol and other drug treatment, with its overlap between health and social welfare services. It has three dimensions: 1. The ways in which providers are chosen; 2. The ways in which services are paid for; and 3. How price is managed. Distinguishing between the methods for choosing providers (such as competitive or individually negotiated processes) from the way in which organisations are paid for their provision of treatment (such as via a block grant or payment for activity) provides conceptual clarity and enables closer analysis of each mechanism. Governments can improve health and wellbeing by making informed decisions about the way they purchase and fund alcohol and other drug treatment. Research comparing different purchasing arrangements can provide a vital evidence-base to inform funders; however a first step is to accurately and consistently categorise current approaches against a typology or conceptual schema. Copyright © 2016 Elsevier Ltd. All rights reserved.
a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries
NASA Astrophysics Data System (ADS)
Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.
2017-10-01
Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.
ECG Rhythm Analysis with Expert and Learner-Generated Schemas in Novice Learners
ERIC Educational Resources Information Center
Blissett, Sarah; Cavalcanti, Rodrigo; Sibbald, Matthew
2015-01-01
Although instruction using expert-generated schemas is associated with higher diagnostic performance, implementation is resource intensive. Learner-generated schemas are an alternative, but may be limited by increases in cognitive load. We compared expert- and learner-generated schemas for learning ECG rhythm interpretation on diagnostic accuracy,…
Thai University Student Schemas and Anxiety Symptomatology
ERIC Educational Resources Information Center
Rhein, Douglas; Sukawatana, Parisa
2015-01-01
This study explores how early maladaptive schemas (EMSs) contribute to the development of anxiety symptomologies among college undergraduates (N = 110). The study was conducted by assessing the correlations between 18 schemas derived from Young's model of Early Maladaptive Schemas (EMSs) and anxiety symptoms using Zung Self-Rating Anxiety Scale…
Unifying Access to National Hydrologic Data Repositories via Web Services
NASA Astrophysics Data System (ADS)
Valentine, D. W.; Jennings, B.; Zaslavsky, I.; Maidment, D. R.
2006-12-01
The CUAHSI hydrologic information system (HIS) is designed to be a live, multiscale web portal system for accessing, querying, visualizing, and publishing distributed hydrologic observation data and models for any location or region in the United States. The HIS design follows the principles of open service oriented architecture, i.e. system components are represented as web services with well defined standard service APIs. WaterOneFlow web services are the main component of the design. The currently available services have been completely re-written compared to the previous version, and provide programmatic access to USGS NWIS. (steam flow, groundwater and water quality repositories), DAYMET daily observations, NASA MODIS, and Unidata NAM streams, with several additional web service wrappers being added (EPA STORET, NCDC and others.). Different repositories of hydrologic data use different vocabularies, and support different types of query access. Resolving semantic and structural heterogeneities across different hydrologic observation archives and distilling a generic set of service signatures is one of the main scalability challenges in this project, and a requirement in our web service design. To accomplish the uniformity of the web services API, data repositories are modeled following the CUAHSI Observation Data Model. The web service responses are document-based, and use an XML schema to express the semantics in a standard format. Access to station metadata is provided via web service methods, GetSites, GetSiteInfo and GetVariableInfo. The methdods form the foundation of CUAHSI HIS discovery interface and may execute over locally-stored metadata or request the information from remote repositories directly. Observation values are retrieved via a generic GetValues method which is executed against national data repositories. The service is implemented in ASP.Net, and other providers are implementing WaterOneFlow services in java. Reference implementation of WaterOneFlow web services is available. More information about the ongoing development of CUAHSI HIS is available from http://www.cuahsi.org/his/.
A Hybrid Spatio-Temporal Data Indexing Method for Trajectory Databases
Ke, Shengnan; Gong, Jun; Li, Songnian; Zhu, Qing; Liu, Xintao; Zhang, Yeting
2014-01-01
In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type. PMID:25051028
A hybrid spatio-temporal data indexing method for trajectory databases.
Ke, Shengnan; Gong, Jun; Li, Songnian; Zhu, Qing; Liu, Xintao; Zhang, Yeting
2014-07-21
In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type.
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.
Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias
2018-03-01
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem.
Phadungsukanan, Weerapong; Kraft, Markus; Townsend, Joe A; Murray-Rust, Peter
2012-08-07
: This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.
The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem
2012-01-01
This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications. PMID:22870956
Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment
Anwar, Nadia; Hunt, Ela
2009-01-01
Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482
An Information Retrieval and Recommendation System for Astronomical Observatories
NASA Astrophysics Data System (ADS)
Mukund, Nikhil; Thakur, Saurabh; Abraham, Sheelu; Aniyan, A. K.; Mitra, Sanjit; Sajeeth Philip, Ninan; Vaghmare, Kaustubh; Acharjya, D. P.
2018-03-01
We present a machine-learning-based information retrieval system for astronomical observatories that tries to address user-defined queries related to an instrument. In the modern instrumentation scenario where heterogeneous systems and talents are simultaneously at work, the ability to supply people with the right information helps speed up the tasks for detector operation, maintenance, and upgradation. The proposed method analyzes existing documented efforts at the site to intelligently group related information to a query and to present it online to the user. The user in response can probe the suggested content and explore previously developed solutions or probable ways to address the present situation optimally. We demonstrate natural language-processing-backed knowledge rediscovery by making use of the open source logbook data from the Laser Interferometric Gravitational Observatory (LIGO). We implement and test a web application that incorporates the above idea for LIGO Livingston, LIGO Hanford, and Virgo observatories.
Del Fiol, Guilherme; Michelson, Matthew; Iorio, Alfonso; Cotoi, Chris; Haynes, R Brian
2018-06-25
A major barrier to the practice of evidence-based medicine is efficiently finding scientifically sound studies on a given clinical topic. To investigate a deep learning approach to retrieve scientifically sound treatment studies from the biomedical literature. We trained a Convolutional Neural Network using a noisy dataset of 403,216 PubMed citations with title and abstract as features. The deep learning model was compared with state-of-the-art search filters, such as PubMed's Clinical Query Broad treatment filter, McMaster's textword search strategy (no Medical Subject Heading, MeSH, terms), and Clinical Query Balanced treatment filter. A previously annotated dataset (Clinical Hedges) was used as the gold standard. The deep learning model obtained significantly lower recall than the Clinical Queries Broad treatment filter (96.9% vs 98.4%; P<.001); and equivalent recall to McMaster's textword search (96.9% vs 97.1%; P=.57) and Clinical Queries Balanced filter (96.9% vs 97.0%; P=.63). Deep learning obtained significantly higher precision than the Clinical Queries Broad filter (34.6% vs 22.4%; P<.001) and McMaster's textword search (34.6% vs 11.8%; P<.001), but was significantly lower than the Clinical Queries Balanced filter (34.6% vs 40.9%; P<.001). Deep learning performed well compared to state-of-the-art search filters, especially when citations were not indexed. Unlike previous machine learning approaches, the proposed deep learning model does not require feature engineering, or time-sensitive or proprietary features, such as MeSH terms and bibliometrics. Deep learning is a promising approach to identifying reports of scientifically rigorous clinical research. Further work is needed to optimize the deep learning model and to assess generalizability to other areas, such as diagnosis, etiology, and prognosis. ©Guilherme Del Fiol, Matthew Michelson, Alfonso Iorio, Chris Cotoi, R Brian Haynes. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.06.2018.
SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.
Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan
2014-08-15
Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.
ProBiS-CHARMMing: Web Interface for Prediction and Optimization of Ligands in Protein Binding Sites.
Konc, Janez; Miller, Benjamin T; Štular, Tanja; Lešnik, Samo; Woodcock, H Lee; Brooks, Bernard R; Janežič, Dušanka
2015-11-23
Proteins often exist only as apo structures (unligated) in the Protein Data Bank, with their corresponding holo structures (with ligands) unavailable. However, apoproteins may not represent the amino-acid residue arrangement upon ligand binding well, which is especially problematic for molecular docking. We developed the ProBiS-CHARMMing web interface by connecting the ProBiS ( http://probis.cmm.ki.si ) and CHARMMing ( http://www.charmming.org ) web servers into one functional unit that enables prediction of protein-ligand complexes and allows for their geometry optimization and interaction energy calculation. The ProBiS web server predicts ligands (small compounds, proteins, nucleic acids, and single-atom ligands) that may bind to a query protein. This is achieved by comparing its surface structure against a nonredundant database of protein structures and finding those that have binding sites similar to that of the query protein. Existing ligands found in the similar binding sites are then transposed to the query according to predictions from ProBiS. The CHARMMing web server enables, among other things, minimization and potential energy calculation for a wide variety of biomolecular systems, and it is used here to optimize the geometry of the predicted protein-ligand complex structures using the CHARMM force field and to calculate their interaction energies with the corresponding query proteins. We show how ProBiS-CHARMMing can be used to predict ligands and their poses for a particular binding site, and minimize the predicted protein-ligand complexes to obtain representations of holoproteins. The ProBiS-CHARMMing web interface is freely available for academic users at http://probis.nih.gov.
ERIC Educational Resources Information Center
Mahalik, James R.; Morrison, Jay A.
2006-01-01
Cognitive therapists may be able to help fathers increase their involvement with their children by identifying and changing restrictive masculine schemas that interfere with men's parenting roles. In this paper, we (a) discuss the development of restrictive masculine schemas, (b) explain how these schemas may affect men's involvement in fathering…
Plant, Katherine L; Stanton, Neville A
2013-01-01
Schema Theory is intuitively appealing although it has not always received positive press; critics of the approach argue that the concept is too ambiguous and vague and there are inherent difficulties associated with measuring schemata. As such, the term schema can be met with scepticism and wariness. The purpose of this paper is to address the criticisms that have been levelled at Schema Theory by demonstrating how Schema Theory has been utilised in Ergonomics research, particularly in the key areas of situation awareness, naturalistic decision making and error. The future of Schema Theory is also discussed in light of its potential roles as a unifying theory in Ergonomics and in contributing to our understanding of distributed cognition. We conclude that Schema Theory has made a positive contribution to Ergonomics and with continued refinement of methods to infer and represent schemata it is likely that this trend will continue. This paper reviews the contribution that Schema Theory has made to Ergonomics research. The criticisms of the theory are addressed using examples from the areas of situation awareness, decision making and error.
Chapman, Wendy W.; Dowling, John N.
2006-01-01
Evaluating automated indexing applications requires comparing automatically indexed terms against manual reference standard annotations. However, there are no standard guidelines for determining which words from a textual document to include in manual annotations, and the vague task can result in substantial variation among manual indexers. We applied grounded theory to emergency department reports to create an annotation schema representing syntactic and semantic variables that could be annotated when indexing clinical conditions. We describe the annotation schema, which includes variables representing medical concepts (e.g., symptom, demographics), linguistic form (e.g., noun, adjective), and modifier types (e.g., anatomic location, severity). We measured the schema’s quality and found: (1) the schema was comprehensive enough to be applied to 20 unseen reports without changes to the schema; (2) agreement between author annotators applying the schema was high, with an F measure of 93%; and (3) an error analysis showed that the authors made complementary errors when applying the schema, demonstrating that the schema incorporates both linguistic and medical expertise. PMID:16230050
Hablo Inglés y Español: Cultural Self-Schemas as a Function of Language.
Rodríguez-Arauz, Gloriana; Ramírez-Esparza, Nairán; Pérez-Brena, Norma; Boyd, Ryan L
2017-01-01
Research has demonstrated that bilingual individuals experience a "double personality," which allows them to shift their self-schemas when they are primed with different language modes. In this study, we examine whether self-schemas change in Mexican-American ( N = 193) bilinguals living in the U.S. when they provide open-ended personality self-descriptions in both English and Spanish. We used the Meaning Extraction Helper (MEH) software to extract the most salient self-schemas that influence individuals' self-defining process. Following a qualitative-inductive approach, words were extracted from the open-ended essays and organized into semantic clusters, which were analyzed qualitatively and named. The results show that as expected, language primed bilinguals to think about different self-schemas. In Spanish, their Mexican self-schemas were more salient; whereas, in English their U.S. American self-schemas were more salient. Similarities of self-schemas across languages were assessed using a quantitative approach. Language differences and similarities in theme definition and implications for self-identity of bilinguals are discussed.
Hablo Inglés y Español: Cultural Self-Schemas as a Function of Language
Rodríguez-Arauz, Gloriana; Ramírez-Esparza, Nairán; Pérez-Brena, Norma; Boyd, Ryan L.
2017-01-01
Research has demonstrated that bilingual individuals experience a “double personality,” which allows them to shift their self-schemas when they are primed with different language modes. In this study, we examine whether self-schemas change in Mexican-American (N = 193) bilinguals living in the U.S. when they provide open-ended personality self-descriptions in both English and Spanish. We used the Meaning Extraction Helper (MEH) software to extract the most salient self-schemas that influence individuals' self-defining process. Following a qualitative-inductive approach, words were extracted from the open-ended essays and organized into semantic clusters, which were analyzed qualitatively and named. The results show that as expected, language primed bilinguals to think about different self-schemas. In Spanish, their Mexican self-schemas were more salient; whereas, in English their U.S. American self-schemas were more salient. Similarities of self-schemas across languages were assessed using a quantitative approach. Language differences and similarities in theme definition and implications for self-identity of bilinguals are discussed. PMID:28611719
Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio
2012-07-01
During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
Earth science big data at users' fingertips: the EarthServer Science Gateway Mobile
NASA Astrophysics Data System (ADS)
Barbera, Roberto; Bruno, Riccardo; Calanducci, Antonio; Fargetta, Marco; Pappalardo, Marco; Rundo, Francesco
2014-05-01
The EarthServer project (www.earthserver.eu), funded by the European Commission under its Seventh Framework Program, aims at establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending leading-edge Array Database technology. The core idea is to use database query languages as client/server interface to achieve barrier-free "mix & match" access to multi-source, any-size, multi-dimensional space-time data -- in short: "Big Earth Data Analytics" - based on the open standards of the Open Geospatial Consortium Web Coverage Processing Service (OGC WCPS) and the W3C XQuery. EarthServer combines both, thereby achieving a tight data/metadata integration. Further, the rasdaman Array Database System (www.rasdaman.com) is extended with further space-time coverage data types. On server side, highly effective optimizations - such as parallel and distributed query processing - ensure scalability to Exabyte volumes. In this contribution we will report on the EarthServer Science Gateway Mobile, an app for both iOS and Android-based devices that allows users to seamlessly access some of the EarthServer applications using SAML-based federated authentication and fine-grained authorisation mechanisms.
Baby Schema in Infant Faces Induces Cuteness Perception and Motivation for Caretaking in Adults.
Glocker, Melanie L; Langleben, Daniel D; Ruparel, Kosha; Loughead, James W; Gur, Ruben C; Sachser, Norbert
2009-03-01
Ethologist Konrad Lorenz proposed that baby schema ('Kindchenschema') is a set of infantile physical features such as the large head, round face and big eyes that is perceived as cute and motivates caretaking behavior in other individuals, with the evolutionary function of enhancing offspring survival. Previous work on this fundamental concept was restricted to schematic baby representations or correlative approaches. Here, we experimentally tested the effects of baby schema on the perception of cuteness and the motivation for caretaking using photographs of infant faces. Employing quantitative techniques, we parametrically manipulated the baby schema content to produce infant faces with high (e.g. round face and high forehead), and low (e. g. narrow face and low forehead) baby schema features that retained all the characteristics of a photographic portrait. Undergraduate students (n = 122) rated these infants' cuteness and their motivation to take care of them. The high baby schema infants were rated as more cute and elicited stronger motivation for caretaking than the unmanipulated and the low baby schema infants. This is the first experimental proof of the baby schema effects in actual infant faces. Our findings indicate that the baby schema response is a critical function of human social cognition that may be the basis of caregiving and have implications for infant-caretaker interactions.
Soygüt, Gonca; Cakir, Zehra
2009-01-01
The first aim of this study was to examine the relationships between perceived parenting styles and interpersonal schemas. The second purpose was to investigate the mediator role of interpersonal schemas between perceived parenting styles and psychological symptoms. University students (N=94), ages ranging between 17-26, attending to different faculty and classes, have completed Interpersonal Schema Questionnaire, Young Parenting Inventory and Symptom Check List-90. A series of regression analyses revealed that perceived parenting styles have predictive power on a number of interpersonal schemas. Further analyses pointed out that the mediator role of Hostility situation of interpersonal schemas between psychological symptoms and normative, belittling/criticizing, pessimistic/worried parenting styles on the mother forms (Sobel z= 1.94-2.08, p < .01); and normative, belittling/criticizing, emotionally depriving, pessimistic/worried, punitive, and restricted/emotionally inhibited parenting styles (Sobel z= 2.20-2.86, p < .05-.01) on the father forms of the scales. Regression analyses pointed out the predictive power of perceived parenting styles on interpersonal schemas. Moreover, the mediator role of interpersonal schemas between perceived parenting styles and psychological symptoms was also observed. Excluding pessimistic/anxious parenting styles, perceived parenting styles of mothers and fathers differed in their relation to psychological symptoms. In overall evaluation, we believe that, although schemas and parental styles have some universalities in relation to their impacts on psychological health, further research is necessary to address their implications and possible paternal differences in our collectivistic cultural context.
Chen, Po-Hao; Loehfelm, Thomas W; Kamer, Aaron P; Lemmon, Andrew B; Cook, Tessa S; Kohli, Marc D
2016-12-01
The residency review committee of the Accreditation Council of Graduate Medical Education (ACGME) collects data on resident exam volume and sets minimum requirements. However, this data is not made readily available, and the ACGME does not share their tools or methodology. It is therefore difficult to assess the integrity of the data and determine if it truly reflects relevant aspects of the resident experience. This manuscript describes our experience creating a multi-institutional case log, incorporating data from three American diagnostic radiology residency programs. Each of the three sites independently established automated query pipelines from the various radiology information systems in their respective hospital groups, thereby creating a resident-specific database. Then, the three institutional resident case log databases were aggregated into a single centralized database schema. Three hundred thirty residents and 2,905,923 radiologic examinations over a 4-year span were catalogued using 11 ACGME categories. Our experience highlights big data challenges including internal data heterogeneity and external data discrepancies faced by informatics researchers.
Optimal Chunking of Large Multidimensional Arrays for Data Warehousing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Otoo, Ekow J; Otoo, Ekow J.; Rotem, Doron
2008-02-15
Very large multidimensional arrays are commonly used in data intensive scientific computations as well as on-line analytical processingapplications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is"whatmore » shapes of array chunks give the minimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0percent of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of Sarawagi and Stonebraker can deviate considerably from the true result with increasing number of dimensions and also may lead to suboptimal chunk shapes.« less
EarthServer: a Summary of Achievements in Technology, Services, and Standards
NASA Astrophysics Data System (ADS)
Baumann, Peter
2015-04-01
Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timese ries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The transatlantic EarthServer initiative, running from 2011 through 2014, has united 11 partners to establish Big Earth Data Analytics. A key ingredient has been flexibility for users to ask whatever they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level, standards-based query languages which unify data and metadata search in a simple, yet powerful way. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing cod e has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, the pioneer and leading Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language. Reviewers have attested EarthServer that "With no doubt the project has been shaping the Big Earth Data landscape through the standardization activities within OGC, ISO and beyond". We present the project approach, its outcomes and impact on standardization and Big Data technology, and vistas for the future.
Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.
2015-01-01
We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
Self-Schemas, Depression, and the Processing of Personal Information in Children.
ERIC Educational Resources Information Center
Hammen, Constance; Zupan, Brian A.
1984-01-01
Investigates the applicability of the self-as-schema model to children and examines the extent of negative self-schemas in relatively depressed children among 61 elementary school students; most of the students were between 8 and 12 years old. Results were consistent with the self-as-schema hypotheses, and mood congruent content-specific recall…
Neural Correlates of Post-Conventional Moral Reasoning: A Voxel-Based Morphometry Study
Prehn, Kristin; Korczykowski, Marc; Rao, Hengyi; Fang, Zhuo; Detre, John A.; Robertson, Diana C.
2015-01-01
Going back to Kohlberg, moral development research affirms that people progress through different stages of moral reasoning as cognitive abilities mature. Individuals at a lower level of moral reasoning judge moral issues mainly based on self-interest (personal interests schema) or based on adherence to laws and rules (maintaining norms schema), whereas individuals at the post-conventional level judge moral issues based on deeper principles and shared ideals. However, the extent to which moral development is reflected in structural brain architecture remains unknown. To investigate this question, we used voxel-based morphometry and examined the brain structure in a sample of 67 Master of Business Administration (MBA) students. Subjects completed the Defining Issues Test (DIT-2) which measures moral development in terms of cognitive schema preference. Results demonstrate that subjects at the post-conventional level of moral reasoning were characterized by increased gray matter volume in the ventromedial prefrontal cortex and subgenual anterior cingulate cortex, compared with subjects at a lower level of moral reasoning. Our findings support an important role for both cognitive and emotional processes in moral reasoning and provide first evidence for individual differences in brain structure according to the stages of moral reasoning first proposed by Kohlberg decades ago. PMID:26039547
Akce, Abdullah; Norton, James J S; Bretl, Timothy
2015-09-01
This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.
Multi-Case Knowledge-Based IMRT Treatment Planning in Head and Neck Cancer
NASA Astrophysics Data System (ADS)
Grzetic, Shelby Mariah
Head and neck cancer (HNC) IMRT treatment planning is a challenging process that relies heavily on the planner's experience. Previously, we used the single, best match from a library of manually planned cases to semi-automatically generate IMRT plans for a new patient. The current multi-case Knowledge Based Radiation Therapy (MC-KBRT) study utilized different matching cases for each of six individual organs-at-risk (OARs), then combined those six cases to create the new treatment plan. From a database of 103 patient plans created by experienced planners, MC-KBRT plans were created for 40 (17 unilateral and 23 bilateral) HNC "query" patients. For each case, 2D beam's-eye-view images were used to find similar geometric "match" patients separately for each of 6 OARs. Dose distributions for each OAR from the 6 matching cases were combined and then warped to suit the query case's geometry. The dose-volume constraints were used to create the new query treatment plan without the need for human decision-making throughout the IMRT optimization. The optimized MC-KBRT plans were compared against the clinically approved plans and Version 1 (previous KBRT using only one matching case with dose warping) using the dose metrics: mean, median, and maximum (brainstem and cord+5mm) doses. Compared to Version 1, MC-KBRT had no significant reduction of the dose to any of the OARs in either unilateral or bilateral cases. Compared to the manually planned unilateral cases, there was significant reduction of the oral cavity mean/median dose (>2Gy) at the expense of the contralateral parotid. Compared to the manually planned bilateral cases, reduction of dose was significant in the ipsilateral parotid, larynx, and oral cavity (>3Gy mean/median) while maintaining PTV coverage. MC-KBRT planning in head and neck cancer generates IMRT plans with better dose sparing than manually created plans. MC-KBRT using multiple case matches does not show significant dose reduction compared to using a single match case with dose warping.
Architecture for knowledge-based and federated search of online clinical evidence.
Coiera, Enrico; Walther, Martin; Nguyen, Ken; Lovell, Nigel H
2005-10-24
It is increasingly difficult for clinicians to keep up-to-date with the rapidly growing biomedical literature. Online evidence retrieval methods are now seen as a core tool to support evidence-based health practice. However, standard search engine technology is not designed to manage the many different types of evidence sources that are available or to handle the very different information needs of various clinical groups, who often work in widely different settings. The objectives of this paper are (1) to describe the design considerations and system architecture of a wrapper-mediator approach to federate search system design, including the use of knowledge-based, meta-search filters, and (2) to analyze the implications of system design choices on performance measurements. A trial was performed to evaluate the technical performance of a federated evidence retrieval system, which provided access to eight distinct online resources, including e-journals, PubMed, and electronic guidelines. The Quick Clinical system architecture utilized a universal query language to reformulate queries internally and utilized meta-search filters to optimize search strategies across resources. We recruited 227 family physicians from across Australia who used the system to retrieve evidence in a routine clinical setting over a 4-week period. The total search time for a query was recorded, along with the duration of individual queries sent to different online resources. Clinicians performed 1662 searches over the trial. The average search duration was 4.9 +/- 3.2 s (N = 1662 searches). Mean search duration to the individual sources was between 0.05 s and 4.55 s. Average system time (ie, system overhead) was 0.12 s. The relatively small system overhead compared to the average time it takes to perform a search for an individual source shows that the system achieves a good trade-off between performance and reliability. Furthermore, despite the additional effort required to incorporate the capabilities of each individual source (to improve the quality of search results), system maintenance requires only a small additional overhead.
DOIDB: Reusing DataCite's search software as metadata portal for GFZ Data Services
NASA Astrophysics Data System (ADS)
Elger, K.; Ulbricht, D.; Bertelmann, R.
2016-12-01
GFZ Data Services is the central service point for the publication of research data at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences (GFZ). It provides data publishing services to scientists of GFZ, associated projects, and associated institutions. The publishing services aim to make research data and physical samples visible and citable, by assigning persistent identifiers (DOI, IGSN) and by complementing existing IT infrastructure. To integrate several research domains a modular software stack that is made of free software components has been created to manage data and metadata as well as register persistent identifiers [1]. Pivotal component for the registration of DOIs is the DOIDB. It has been derived from three software components provided by DataCite [2] that moderate the registration of DOIs and the deposition of metadata, allow the dissemination of metadata, and provide a user interface to navigate and discover datasets. The DOIDB acts as a proxy to the DataCite infrastructure and in addition to the DataCite metadata schema, it allows to deposit and disseminate metadata following the schemas ISO19139 and NASA GCMD DIF. The search component has been modified to meet the requirements of a geosciences metadata portal. In particular, the search component has been altered to make use of Apache SOLRs capability to index and query spatial coordinates. Furthermore, the user interface has been adjusted to provide a first impression of the data by showing a map, summary information and subjects. DOIDB and its components are available on GitHub [3].We present a software solution for registration of DOIs that allows to integrate existing data systems, keeps track of registered DOIs, and provides a metadata portal to discover datasets [4]. [1] Ulbricht, D.; Elger, K.; Bertelmann, R.; Klump, J. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services. ISPRS Int. J. Geo-Inf. 2016, 5, 25. http://doi.org/10.3390/ijgi5030025[2] https://github.com/datacite[3] https://github.com/ulbricht/search/tree/doidb , https://github.com/ulbricht/mds/tree/doidb , https://github.com/ulbricht/oaip/tree/doidb[4] http://doidb.wdc-terra.org
Research on Historic Bim of Built Heritage in Taiwan - a Case Study of Huangxi Academy
NASA Astrophysics Data System (ADS)
Lu, Y. C.; Shih, T. Y.; Yen, Y. N.
2018-05-01
Digital archiving technology for conserving cultural heritage is an important subject nowadays. The Taiwanese Ministry of Culture continues to try to converge the concept and technology of conservation towards international conventions. However, the products from these different technologies are not yet integrated due to the lack of research and development in this field. There is currently no effective schema in HBIM for Taiwanese cultural heritage. The aim of this research is to establish an HBIM schema for Chinese built heritage in Taiwan. The proposed method starts from the perspective of the components of built heritage buildings, up to the investigation of the important properties of the components through important international charters and Taiwanese laws of cultural heritage conservation. Afterwards, object-oriented class diagram and ontology from the scale of components were defined to clarify the concept and increase the interoperability. A historical database was then established for the historical information of components and to bring it into the concept of BIM in order to build a 3D model of heritage objects which can be used for visualization. An integration platform was developed for the users to browse and manipulate the database and 3D model simultaneously. In addition, this research also evaluated the feasibility of this method using the study case at the Huangxi academy located in Taiwan. The conclusion showed that class diagram could help the establishment of database and even its application for different Chinese built heritage objects. The establishment of ontology helped to convey knowledge and increase interoperability. In comparison to traditional documentation methods, the querying result of the platform was more accurate and less prone to human error.
A Magnetic Petrology Database for Satellite Magnetic Anomaly Interpretations
NASA Astrophysics Data System (ADS)
Nazarova, K.; Wasilewski, P.; Didenko, A.; Genshaft, Y.; Pashkevich, I.
2002-05-01
A Magnetic Petrology Database (MPDB) is now being compiled at NASA/Goddard Space Flight Center in cooperation with Russian and Ukrainian Institutions. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via Internet for more realistic interpretation of satellite magnetic anomalies. Magnetic Petrology Data had been accumulated in NASA/Goddard Space Flight Center, United Institute of Physics of the Earth (Russia) and Institute of Geophysics (Ukraine) over several decades and now consists of many thousands of records of data in our archives. The MPDB was, and continues to be in big demand especially since recent launching in near Earth orbit of the mini-constellation of three satellites - Oersted (in 1999), Champ (in 2000), and SAC-C (in 2000) which will provide lithospheric magnetic maps with better spatial and amplitude resolution (about 1 nT). The MPDB is focused on lower crustal and upper mantle rocks and will include data on mantle xenoliths, serpentinized ultramafic rocks, granulites, iron quartzites and rocks from Archean-Proterozoic metamorphic sequences from all around the world. A substantial amount of data is coming from the area of unique Kursk Magnetic Anomaly and Kola Deep Borehole (which recovered 12 km of continental crust). A prototype MPDB can be found on the Geodynamics Branch web server of Goddard Space Flight Center at http://core2.gsfc.nasa.gov/terr_mag/magnpetr.html. The MPDB employs a searchable relational design and consists of 7 interrelated tables. The schema of database is shown at http://core2.gsfc.nasa.gov/terr_mag/doc.html. MySQL database server was utilized to implement MPDB. The SQL (Structured Query Language) is used to query the database. To present the results of queries on WEB and for WEB programming we utilized PHP scripting language and CGI scripts. The prototype MPDB is designed to search database by major satellite magnetic anomaly, tectonic structure, geographical location, rock type, magnetic properties, chemistry and reference, see http://core2.gsfc.nasa.gov/terr_mag/query1.html. The output of database is HTML structured table, text file, and downloadable file. This database will be very useful for studies of lithospheric satellite magnetic anomalies on the Earth and other terrestrial planets.
Andersen, Barbara L.; Fowler, Jeffrey M.; Maxwell, G. Larry
2008-01-01
Abstract Gynecologic cancer patients are at high risk for emotional distress and sexual dysfunction. The present study tested sexual self schema as an individual difference variable that might be useful in identifying those at risk for unfavorable outcomes. First, we tested schema as a predictor of sexual outcomes,including bodychangestress. Second,we examined schema as a contributor to broader quality of life outcomes, specifically as a moderator of the relationship between sexual satisfaction and psychological statue (depressive symptoms and quality of life). A cross-sectional design was used. Gynecologic cancer survivors (N = 175) 2−10 years post treatment were assessed during routine follow up. In regression analyses controlling for sociodemographic variables, patients' physical symptoms/signs as evaluated by nurses, health status, and extent of partner sexual difficulties, sexual self schema accounted for significant variance in the prediction of current sexual behavior, responsiveness, and satisfaction. Moreover, schema moderated the relationship between sexual satisfaction and psychological outcomes, suggesting that a positive sexual self schema might “buffer” patients from depressive symptoms when their sexual satisfaction is low. Furthermore, the combination of a negative sexual self schema and low sexual satisfaction might heighten survivors' risk for psychological distress, including depressive symptomatology. These data support the consideration of sexual self schema as a predictor of sexual morbidity among gynecologic cancer survivors. PMID:18418707
Early Maladaptive Schemas and Aggression in Men Seeking Residential Substance Use Treatment
Shorey, Ryan C.; Elmquist, Joanna; Anderson, Scott; Stuart, Gregory L.
2015-01-01
Social-cognitive theories of aggression postulate that individuals who perpetrate aggression are likely to have high levels of maladaptive cognitive schemas that increase risk for aggression. Indeed, recent research has begun to examine whether early maladaptive schemas may increase the risk for aggression. However, no known research has examined this among individuals in substance use treatment, despite aggression and early maladaptive schemas being more prevalent among individuals with a substance use disorder than the general population. Toward this end, we examined the relationship between early maladaptive schemas and aggression in men in a residential substance use treatment facility (N = 106). Utilizing pre-existing patient records, results demonstrated unique associations between early maladaptive schema domains and aggression depending on the type of aggression and schema domain examined, even after controlling for substance use, antisocial personality, age, and education. The Impaired Limits domain was positively associated with verbal aggression, aggressive attitude, and overall aggression, whereas the Disconnection and Rejection domain was positively associated with physical aggression. These findings are consistent with social-cognitive models of aggression and advance our understanding of how early maladaptive schemas may influence aggression. The implications of these findings for future research are discussed. PMID:25897180
Model-based query language for analyzing clinical processes.
Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris
2013-01-01
Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.
Calvete, Esther; Gámez-Guadix, Manuel; Fernández-Gonzalez, Liria; Orue, Izaskun; Borrajo, Erika
2018-07-01
This study examined whether exposure to family violence, both in the form of direct victimization and witnessing violence, predicted dating violence victimization in adolescents through maladaptive schemas. A sample of 933 adolescents (445 boys and 488 girls), aged between 13 and 18 (M = 15.10), participated in a three-year longitudinal study. They completed measures of exposure to family violence, maladaptive schemas of disconnection/rejection, and dating violence victimization. The findings indicate that witnessing family violence predicts the increase of dating violence victimization over time, through the mediation of maladaptive schemas in girls, but not in boys. Direct victimization in the family predicts dating violence victimization directly, without the mediation of schemas. In addition, maladaptive schemas contribute to the perpetuation of dating violence victimization over time. These findings provide new opportunities for preventive interventions, as maladaptive schemas can be modified. Copyright © 2018 Elsevier Ltd. All rights reserved.
Partitioning an object-oriented terminology schema.
Gu, H; Perl, Y; Halper, M; Geller, J; Kuo, F; Cimino, J J
2001-07-01
Controlled medical terminologies are increasingly becoming strategic components of various healthcare enterprises. However, the typical medical terminology can be difficult to exploit due to its extensive size and high density. The schema of a medical terminology offered by an object-oriented representation is a valuable tool in providing an abstract view of the terminology, enhancing comprehensibility and making it more usable. However, schemas themselves can be large and unwieldy. We present a methodology for partitioning a medical terminology schema into manageably sized fragments that promote increased comprehension. Our methodology has a refinement process for the subclass hierarchy of the terminology schema. The methodology is carried out by a medical domain expert in conjunction with a computer. The expert is guided by a set of three modeling rules, which guarantee that the resulting partitioned schema consists of a forest of trees. This makes it easier to understand and consequently use the medical terminology. The application of our methodology to the schema of the Medical Entities Dictionary (MED) is presented.
Relationship of negative self-schemas and attachment styles with appearance schemas.
Ledoux, Tracey; Winterowd, Carrie; Richardson, Tamara; Clark, Julie Dorton
2010-06-01
The purpose was to test, among women, the relationship between negative self-schemas and styles of attachment with men and women and two types of appearance investment (Self-evaluative and Motivational Salience). Predominantly Caucasian undergraduate women (N=194) completed a modified version of the Relationship Questionnaire, the Young Schema Questionnaire-Short Form, and the Appearance Schemas Inventory-Revised. Linear multiple regression analyses were conducted with Motivational Salience and Self-evaluative Salience of appearance serving as dependent variables and relevant demographic variables, negative self-schemas, and styles of attachment to men serving as independent variables. Styles of attachment to women were not entered into these regression models because Pearson correlations indicated they were not related to either dependent variable. Self-evaluative Salience of appearance was related to impaired autonomy and performance negative self-schema and the preoccupation style of attachment with men, while Motivational Salience of appearance was related only to the preoccupation style of attachment with men. 2010 Elsevier Ltd. All rights reserved.
Neural mechanisms of mental schema: a triplet of delta, low beta/spindle and ripple oscillations.
Ohki, Takefumi; Takei, Yuichi
2018-02-06
Schemas are higher-level knowledge structures that integrate and organise lower-level representations. As internal templates, schemas are formed according to how events are perceived, interpreted and remembered. Although these higher-level units are assumed to play a fundamental role in our daily life from an early age, the neuronal basis and mechanisms of schema formation and use remain largely unknown. It is important to elucidate how the brain constructs and maintains these higher-level units. In order to examine the possible neural underpinnings of schema, we recapitulate previous work and discuss their findings related to schemas as the brain template. We specifically focused on low beta/spindle oscillations, which are assumed to be the key components of schemas, and propose that the brain template is implemented with a triplet of neural oscillations, that is delta, low beta/spindle and ripple oscillations. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Quantum algorithms on Walsh transform and Hamming distance for Boolean functions
NASA Astrophysics Data System (ADS)
Xie, Zhengwei; Qiu, Daowen; Cai, Guangya
2018-06-01
Walsh spectrum or Walsh transform is an alternative description of Boolean functions. In this paper, we explore quantum algorithms to approximate the absolute value of Walsh transform W_f at a single point z0 (i.e., |W_f(z0)|) for n-variable Boolean functions with probability at least 8/π 2 using the number of O(1/|W_f(z_{0)|ɛ }) queries, promised that the accuracy is ɛ , while the best known classical algorithm requires O(2n) queries. The Hamming distance between Boolean functions is used to study the linearity testing and other important problems. We take advantage of Walsh transform to calculate the Hamming distance between two n-variable Boolean functions f and g using O(1) queries in some cases. Then, we exploit another quantum algorithm which converts computing Hamming distance between two Boolean functions to quantum amplitude estimation (i.e., approximate counting). If Ham(f,g)=t≠0, we can approximately compute Ham( f, g) with probability at least 2/3 by combining our algorithm and {Approx-Count(f,ɛ ) algorithm} using the expected number of Θ( √{N/(\\lfloor ɛ t\\rfloor +1)}+√{t(N-t)}/\\lfloor ɛ t\\rfloor +1) queries, promised that the accuracy is ɛ . Moreover, our algorithm is optimal, while the exact query complexity for the above problem is Θ(N) and the query complexity with the accuracy ɛ is O(1/ɛ 2N/(t+1)) in classical algorithm, where N=2n. Finally, we present three exact quantum query algorithms for two promise problems on Hamming distance using O(1) queries, while any classical deterministic algorithm solving the problem uses Ω(2n) queries.
Role of Father–Child Relational Quality in Early Maladaptive Schemas
Monirpoor, Nader; Gholamyzarch, Morteza; Tamaddonfard, Mohsen; Khoosfi, Helen; Ganjali, Ali Reza
2012-01-01
Background Primary maladaptive schemas, which are the basis of high-risk behavior and psychological disorders, result from childhood experiences with significant objects, such as fathers, in different developmental phases. Objectives This endeavor examined the role of the father in predicting these schemas. Patients and Methods A total of 345 Islamic Azad University students (Qom Branch) who were chosen through convenience sampling completed the Young Schema Questionnaire, the Parental Bonding Instrument, and the Parent–Child Relationship Survey. Results A multivariate regression analysis indicated that anumber of aspects of the father–child relationship, including care, emotional interaction, positive affection, the effective relationship, and excessive support, predict particular schemas. Conclusions Therefore, these findings suggested that psychotherapists examine the different aspects of the father–child relationship when restructuring schemas. PMID:24971232
Shorey, Ryan C.; Anderson, Scott; Stuart, Gregory L.
2014-01-01
Individuals with substance use disorders are more likely to have antisocial and borderline personality disorder than non-substance abusers. Recently, research has examined the relations between early maladaptive schemas and personality disorders, as early maladaptive schemas are believed to underlie personality disorders. However, there is a dearth of research on the relations between early maladaptive schemas and personality disorders among individuals seeking treatment for substance abuse. The current study examined the relations among early maladaptive schemas and antisocial and borderline personality within in a sample of men seeking substance abuse treatment (n = 98). Results demonstrated that early maladaptive schema domains were associated with antisocial and borderline personality symptoms. Implications of these findings for substance use treatment and research are discussed. PMID:23650153