deep-web query interfaces: Topics by Science.gov

Sample records for deep-web query interfaces

On Building a Search Interface Discovery System

NASA Astrophysics Data System (ADS)

Shestakov, Denis

A huge portion of the Web known as the deep Web is accessible via search interfaces to myriads of databases on the Web. While relatively good approaches for querying the contents of web databases have been recently proposed, one cannot fully utilize them having most search interfaces unlocated. Thus, the automatic recognition of search interfaces to online databases is crucial for any application accessing the deep Web. This paper describes the architecture of the I-Crawler, a system for finding and classifying search interfaces. The I-Crawler is intentionally designed to be used in the deep web characterization surveys and for constructing directories of deep web resources.
Stratification-Based Outlier Detection over the Deep Web.

PubMed

Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

2016-01-01

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Stratification-Based Outlier Detection over the Deep Web

PubMed Central

Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

2016-01-01

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. PMID:27313603
A Framework for Transparently Accessing Deep Web Sources

ERIC Educational Resources Information Center

Dragut, Eduard Constantin

2010-01-01

An increasing number of Web sites expose their content via query interfaces, many of them offering the same type of products/services (e.g., flight tickets, car rental/purchasing). They constitute the so-called "Deep Web". Accessing the content on the Deep Web has been a long-standing challenge for the database community. For a user interested in…
Hybrid Schema Matching for Deep Web

NASA Astrophysics Data System (ADS)

Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng

Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.
Semantic Annotations and Querying of Web Data Sources

NASA Astrophysics Data System (ADS)

Hornung, Thomas; May, Wolfgang

A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.
Clinic expert information extraction based on domain model and block importance model.

PubMed

Zhang, Yuanpeng; Wang, Li; Qian, Danmin; Geng, Xingyun; Yao, Dengfu; Dong, Jiancheng

2015-11-01

To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method. Copyright © 2015 Elsevier Ltd. All rights reserved.
An advanced web query interface for biological databases

PubMed Central

Latendresse, Mario; Karp, Peter D.

2010-01-01

Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
ESTminer: a Web interface for mining EST contig and cluster databases.

PubMed

Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R

2005-03-01

ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.
PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization.

PubMed

Dogrusoz, U; Erson, E Z; Giral, E; Demir, E; Babur, O; Cetintas, A; Colak, R

2006-02-01

Patikaweb provides a Web interface for retrieving and analyzing biological pathways in the Patika database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting facilities to various pathway exchange formats.
The Research on Automatic Construction of Domain Model Based on Deep Web Query Interfaces

NASA Astrophysics Data System (ADS)

JianPing, Gu

The integration of services is transparent, meaning that users no longer face the millions of Web services, do not care about the required data stored, but do not need to learn how to obtain these data. In this paper, we analyze the uncertainty of schema matching, and then propose a series of similarity measures. To reduce the cost of execution, we propose the type-based optimization method and schema matching pruning method of numeric data. Based on above analysis, we propose the uncertain schema matching method. The experiments prove the effectiveness and efficiency of our method.
Mashups over the Deep Web

NASA Astrophysics Data System (ADS)

Hornung, Thomas; Simon, Kai; Lausen, Georg

Combining information from different Web sources often results in a tedious and repetitive process, e.g. even simple information requests might require to iterate over a result list of one Web query and use each single result as input for a subsequent query. One approach for this chained queries are data-centric mashups, which allow to visually model the data flow as a graph, where the nodes represent the data source and the edges the data flow.
Web-based Hyper Suprime-Cam Data Providing System

NASA Astrophysics Data System (ADS)

Koike, M.; Furusawa, H.; Takata, T.; Price, P.; Okura, Y.; Yamada, Y.; Yamanoi, H.; Yasuda, N.; Bickerton, S.; Katayama, N.; Mineo, S.; Lupton, R.; Bosch, J.; Loomis, C.

2014-05-01

We describe a web-based user interface to retrieve Hyper Suprime-Cam data products, including images and. Users can access data directly from a graphical user interface or by writing a database SQL query. The system provides raw images, reduced images and stacked images (from multiple individual exposures), with previews available. Catalog queries can be executed in preview or queue mode, allowing for both exploratory and comprehensive investigations.
Providing Web Interfaces to the NSF EarthScope USArray Transportable Array

NASA Astrophysics Data System (ADS)

Vernon, Frank; Newman, Robert; Lindquist, Kent

2010-05-01

Since April 2004 the EarthScope USArray seismic network has grown to over 850 broadband stations that stream multi-channel data in near real-time to the Array Network Facility in San Diego. Providing secure, yet open, access to real-time and archived data for a broad range of audiences is best served by a series of platform agnostic low-latency web-based applications. We present a framework of tools that mediate between the world wide web and Boulder Real Time Technologies Antelope Environmental Monitoring System data acquisition and archival software. These tools provide comprehensive information to audiences ranging from network operators and geoscience researchers, to funding agencies and the general public. This ranges from network-wide to station-specific metadata, state-of-health metrics, event detection rates, archival data and dynamic report generation over a station's two year life span. Leveraging open source web-site development frameworks for both the server side (Perl, Python and PHP) and client-side (Flickr, Google Maps/Earth and jQuery) facilitates the development of a robust extensible architecture that can be tailored on a per-user basis, with rapid prototyping and development that adheres to web-standards. Typical seismic data warehouses allow online users to query and download data collected from regional networks, without the scientist directly visually assessing data coverage and/or quality. Using a suite of web-based protocols, we have recently developed an online seismic waveform interface that directly queries and displays data from a relational database through a web-browser. Using the Python interface to Datascope and the Python-based Twisted network package on the server side, and the jQuery Javascript framework on the client side to send and receive asynchronous waveform queries, we display broadband seismic data using the HTML Canvas element that is globally accessible by anyone using a modern web-browser. We are currently creating additional interface tools to create a rich-client interface for accessing and displaying seismic data that can be deployed to any system running the Antelope Real Time System. The software is freely available from the Antelope contributed code Git repository (http://www.antelopeusersgroup.org).
Advanced Query and Data Mining Capabilities for MaROS

NASA Technical Reports Server (NTRS)

Wang, Paul; Wallick, Michael N.; Allard, Daniel A.; Gladden, Roy E.; Hy, Franklin H.

2013-01-01

The Mars Relay Operational Service (MaROS) comprises a number of tools to coordinate, plan, and visualize various aspects of the Mars Relay network. These levels include a Web-based user interface, a back-end "ReSTlet" built in Java, and databases that store the data as it is received from the network. As part of MaROS, the innovators have developed and implemented a feature set that operates on several levels of the software architecture. This new feature is an advanced querying capability through either the Web-based user interface, or through a back-end REST interface to access all of the data gathered from the network. This software is not meant to replace the REST interface, but to augment and expand the range of available data. The current REST interface provides specific data that is used by the MaROS Web application to display and visualize the information; however, the returned information from the REST interface has typically been pre-processed to return only a subset of the entire information within the repository, particularly only the information that is of interest to the GUI (graphical user interface). The new, advanced query and data mining capabilities allow users to retrieve the raw data and/or to perform their own data processing. The query language used to access the repository is a restricted subset of the structured query language (SQL) that can be built safely from the Web user interface, or entered as freeform SQL by a user. The results are returned in a CSV (Comma Separated Values) format for easy exporting to third party tools and applications that can be used for data mining or user-defined visualization and interpretation. This is the first time that a service is capable of providing access to all cross-project relay data from a single Web resource. Because MaROS contains the data for a variety of missions from the Mars network, which span both NASA and ESA, the software also establishes an access control list (ACL) on each data record in the database repository to enforce user access permissions through a multilayered approach.
Developing A Web-based User Interface for Semantic Information Retrieval

NASA Technical Reports Server (NTRS)

Berrios, Daniel C.; Keller, Richard M.

2003-01-01

While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description?; First Steps in an Information Commerce Economy: Digital Rights Management in the Emerging E-Book Environment; Interoperability: Digital Rights Management and the Emerging EBook Environment; Searching the Deep Web: Direct Query Engine Applications at the Department of Energy.

ERIC Educational Resources Information Center

Lagoze, Carl; Neylon, Eamonn; Mooney, Stephen; Warnick, Walter L.; Scott, R. L.; Spence, Karen J.; Johnson, Lorrie A.; Allen, Valerie S.; Lederman, Abe

2001-01-01

Includes four articles that discuss Dublin Core metadata, digital rights management and electronic books, including interoperability; and directed query engines, a type of search engine designed to access resources on the deep Web that is being used at the Department of Energy. (LRW)
Virtual Observatory Interfaces to the Chandra Data Archive

NASA Astrophysics Data System (ADS)

Tibbetts, M.; Harbo, P.; Van Stone, D.; Zografou, P.

2014-05-01

The Chandra Data Archive (CDA) plays a central role in the operation of the Chandra X-ray Center (CXC) by providing access to Chandra data. Proprietary interfaces have been the backbone of the CDA throughout the Chandra mission. While these interfaces continue to provide the depth and breadth of mission specific access Chandra users expect, the CXC has been adding Virtual Observatory (VO) interfaces to the Chandra proposal catalog and observation catalog. VO interfaces provide standards-based access to Chandra data through simple positional queries or more complex queries using the Astronomical Data Query Language. Recent development at the CDA has generalized our existing VO services to create a suite of services that can be configured to provide VO interfaces to any dataset. This approach uses a thin web service layer for the individual VO interfaces, a middle-tier query component which is shared among the VO interfaces for parsing, scheduling, and executing queries, and existing web services for file and data access. The CXC VO services provide Simple Cone Search (SCS), Simple Image Access (SIA), and Table Access Protocol (TAP) implementations for both the Chandra proposal and observation catalogs within the existing archive architecture. Our work with the Chandra proposal and observation catalogs, as well as additional datasets beyond the CDA, illustrates how we can provide configurable VO services to extend core archive functionality.
Digging Deeper: The Deep Web.

ERIC Educational Resources Information Center

Turner, Laura

2001-01-01

Focuses on the Deep Web, defined as Web content in searchable databases of the type that can be found only by direct query. Discusses the problems of indexing; inability to find information not indexed in the search engine's database; and metasearch engines. Describes 10 sites created to access online databases or directly search them. Lists ways…
Named Entity Recognition in a Hungarian NL Based QA System

NASA Astrophysics Data System (ADS)

Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor

In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.

Entrez Neuron RDFa: a pragmatic semantic web application for data integration in neuroscience research.

PubMed

Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

2009-01-01

The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present "Entrez Neuron", a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the 'HCLS knowledgebase' developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrate how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup.
CMCC Data Distribution Centre

NASA Astrophysics Data System (ADS)

Aloisio, Giovanni; Fiore, Sandro; Negro, A.

2010-05-01

The CMCC Data Distribution Centre (DDC) is the primary entry point (web gateway) to the CMCC. It is a Data Grid Portal providing a ubiquitous and pervasive way to ease data publishing, climate metadata search, datasets discovery, metadata annotation, data access, data aggregation, sub-setting, etc. The grid portal security model includes the use of HTTPS protocol for secure communication with the client (based on X509v3 certificates that must be loaded into the browser) and secure cookies to establish and maintain user sessions. The CMCC DDC is now in a pre-production phase and it is currently used only by internal users (CMCC researchers and climate scientists). The most important component already available in the CMCC DDC is the Search Engine which allows users to perform, through web interfaces, distributed search and discovery activities by introducing one or more of the following search criteria: horizontal extent (which can be specified by interacting with a geographic map), vertical extent, temporal extent, keywords, topics, creation date, etc. By means of this page the user submits the first step of the query process on the metadata DB, then, she can choose one or more datasets retrieving and displaying the complete XML metadata description (from the browser). This way, the second step of the query process is carried out by accessing to a specific XML document of the metadata DB. Finally, through the web interface, the user can access to and download (partially or totally) the data stored on the storage device accessing to OPeNDAP servers and to other available grid storage interfaces. Requests concerning datasets stored in deep storage will be served asynchronously.
VisGets: coordinated visualizations for web-based information exploration and discovery.

PubMed

Dörk, Marian; Carpendale, Sheelagh; Collins, Christopher; Williamson, Carey

2008-01-01

In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets--interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.
Entrez Neuron RDFa: a pragmatic Semantic Web application for data integration in neuroscience research

PubMed Central

Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

2013-01-01

The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present “Entrez Neuron”, a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the ‘HCLS knowledgebase’ developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrates how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup. PMID:19745321
The Deep Impact Network Experiment Operations Center Monitor and Control System

NASA Technical Reports Server (NTRS)

Wang, Shin-Ywan (Cindy); Torgerson, J. Leigh; Schoolcraft, Joshua; Brenman, Yan

2009-01-01

The Interplanetary Overlay Network (ION) software at JPL is an implementation of Delay/Disruption Tolerant Networking (DTN) which has been proposed as an interplanetary protocol to support space communication. The JPL Deep Impact Network (DINET) is a technology development experiment intended to increase the technical readiness of the JPL implemented ION suite. The DINET Experiment Operations Center (EOC) developed by JPL's Protocol Technology Lab (PTL) was critical in accomplishing the experiment. EOC, containing all end nodes of simulated spaces and one administrative node, exercised publish and subscribe functions for payload data among all end nodes to verify the effectiveness of data exchange over ION protocol stacks. A Monitor and Control System was created and installed on the administrative node as a multi-tiered internet-based Web application to support the Deep Impact Network Experiment by allowing monitoring and analysis of the data delivery and statistics from ION. This Monitor and Control System includes the capability of receiving protocol status messages, classifying and storing status messages into a database from the ION simulation network, and providing web interfaces for viewing the live results in addition to interactive database queries.
A new reference implementation of the PSICQUIC web service.

PubMed

del-Toro, Noemi; Dumousseau, Marine; Orchard, Sandra; Jimenez, Rafael C; Galeota, Eugenia; Launay, Guillaume; Goll, Johannes; Breuer, Karin; Ono, Keiichiro; Salwinski, Lukasz; Hermjakob, Henning

2013-07-01

The Proteomics Standard Initiative Common QUery InterfaCe (PSICQUIC) specification was created by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI) to enable computational access to molecular-interaction data resources by means of a standard Web Service and query language. Currently providing >150 million binary interaction evidences from 28 servers globally, the PSICQUIC interface allows the concurrent search of multiple molecular-interaction information resources using a single query. Here, we present an extension of the PSICQUIC specification (version 1.3), which has been released to be compliant with the enhanced standards in molecular interactions. The new release also includes a new reference implementation of the PSICQUIC server available to the data providers. It offers augmented web service capabilities and improves the user experience. PSICQUIC has been running for almost 5 years, with a user base growing from only 4 data providers to 28 (April 2013) allowing access to 151 310 109 binary interactions. The power of this web service is shown in PSICQUIC View web application, an example of how to simultaneously query, browse and download results from the different PSICQUIC servers. This application is free and open to all users with no login requirement (http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml).
mirEX: a platform for comparative exploration of plant pri-miRNA expression data.

PubMed

Bielewicz, Dawid; Dolata, Jakub; Zielezinski, Andrzej; Alaba, Sylwia; Szarzynska, Bogna; Szczesniak, Michal W; Jarmolowski, Artur; Szweykowska-Kulinska, Zofia; Karlowski, Wojciech M

2012-01-01

mirEX is a comprehensive platform for comparative analysis of primary microRNA expression data. RT-qPCR-based gene expression profiles are stored in a universal and expandable database scheme and wrapped by an intuitive user-friendly interface. A new way of accessing gene expression data in mirEX includes a simple mouse operated querying system and dynamic graphs for data mining analyses. In contrast to other publicly available databases, the mirEX interface allows a simultaneous comparison of expression levels between various microRNA genes in diverse organs and developmental stages. Currently, mirEX integrates information about the expression profile of 190 Arabidopsis thaliana pri-miRNAs in seven different developmental stages: seeds, seedlings and various organs of mature plants. Additionally, by providing RNA structural models, publicly available deep sequencing results, experimental procedure details and careful selection of auxiliary data in the form of web links, mirEX can function as a one-stop solution for Arabidopsis microRNA information. A web-based mirEX interface can be accessed at http://bioinfo.amu.edu.pl/mirex.
Semantic integration of information about orthologs and diseases: the OGO system.

PubMed

Miñarro-Gimenez, Jose Antonio; Egaña Aranguren, Mikel; Martínez Béjar, Rodrigo; Fernández-Breis, Jesualdo Tomás; Madrid, Marisa

2011-12-01

Semantic Web technologies like RDF and OWL are currently applied in life sciences to improve knowledge management by integrating disparate information. Many of the systems that perform such task, however, only offer a SPARQL query interface, which is difficult to use for life scientists. We present the OGO system, which consists of a knowledge base that integrates information of orthologous sequences and genetic diseases, providing an easy to use ontology-constrain driven query interface. Such interface allows the users to define SPARQL queries through a graphical process, therefore not requiring SPARQL expertise. Copyright © 2011 Elsevier Inc. All rights reserved.
ProBiS-CHARMMing: Web Interface for Prediction and Optimization of Ligands in Protein Binding Sites.

PubMed

Konc, Janez; Miller, Benjamin T; Štular, Tanja; Lešnik, Samo; Woodcock, H Lee; Brooks, Bernard R; Janežič, Dušanka

2015-11-23

Proteins often exist only as apo structures (unligated) in the Protein Data Bank, with their corresponding holo structures (with ligands) unavailable. However, apoproteins may not represent the amino-acid residue arrangement upon ligand binding well, which is especially problematic for molecular docking. We developed the ProBiS-CHARMMing web interface by connecting the ProBiS ( http://probis.cmm.ki.si ) and CHARMMing ( http://www.charmming.org ) web servers into one functional unit that enables prediction of protein-ligand complexes and allows for their geometry optimization and interaction energy calculation. The ProBiS web server predicts ligands (small compounds, proteins, nucleic acids, and single-atom ligands) that may bind to a query protein. This is achieved by comparing its surface structure against a nonredundant database of protein structures and finding those that have binding sites similar to that of the query protein. Existing ligands found in the similar binding sites are then transposed to the query according to predictions from ProBiS. The CHARMMing web server enables, among other things, minimization and potential energy calculation for a wide variety of biomolecular systems, and it is used here to optimize the geometry of the predicted protein-ligand complex structures using the CHARMM force field and to calculate their interaction energies with the corresponding query proteins. We show how ProBiS-CHARMMing can be used to predict ligands and their poses for a particular binding site, and minimize the predicted protein-ligand complexes to obtain representations of holoproteins. The ProBiS-CHARMMing web interface is freely available for academic users at http://probis.nih.gov.
SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.

PubMed

Chiba, Hirokazu; Uchiyama, Ikuo

2017-02-08

Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .
A Simple and Customizable Web Interface to the Virtual Solar Observatory

NASA Astrophysics Data System (ADS)

Hughitt, V. Keith; Hourcle, J.; Suarez-Sola, I.; Davey, A.

2010-05-01

As the variety and number of solar data sources continue to increase at a rapid rate, the importance of providing methods to search through these sources becomes increasingly important. By taking advantage of the power of modern JavaScript libraries, a new version of the Virtual Solar Observatory's web interface aims to provide a significantly faster and simpler way to explore the multitude of data repositories available. Querying asynchroniously serves not only to eliminates bottlenecks resulting from slow or unresponsive data providers, but also allows for displaying of results as soon as they are returned. Implicit pagination and post-query filtering enables users to work with large result-sets, while a more modular and customizable UI provides a mechanism for customizing both the look-and-feel and behavior of the VSO web interface. Finally, the new web interface features a custom widget system capable of displaying additional tools and information along-side of the standard VSO search form. Interested users can also write their own widgets and submit them for future incorporation into VSO.
GeoNetwork powered GI-cat: a geoportal hybrid solution

NASA Astrophysics Data System (ADS)

Baldini, Alessio; Boldrini, Enrico; Santoro, Mattia; Mazzetti, Paolo

2010-05-01

To the aim of setting up a Spatial Data Infrastructures (SDI) the creation of a system for the metadata management and discovery plays a fundamental role. An effective solution is the use of a geoportal (e.g. FAO/ESA geoportal), that has the important benefit of being accessible from a web browser. With this work we present a solution based integrating two of the available frameworks: GeoNetwork and GI-cat. GeoNetwork is an opensource software designed to improve accessibility of a wide variety of data together with the associated ancillary information (metadata), at different scale and from multidisciplinary sources; data are organized and documented in a standard and consistent way. GeoNetwork implements both the Portal and Catalog components of a Spatial Data Infrastructure (SDI) defined in the OGC Reference Architecture. It provides tools for managing and publishing metadata on spatial data and related services. GeoNetwork allows harvesting of various types of web data sources e.g. OGC Web Services (e.g. CSW, WCS, WMS). GI-cat is a distributed catalog based on a service-oriented framework of modular components and can be customized and tailored to support different deployment scenarios. It can federate a multiplicity of catalogs services, as well as inventory and access services in order to discover and access heterogeneous ESS resources. The federated resources are exposed by GI-cat through several standard catalog interfaces (e.g. OGC CSW AP ISO, OpenSearch, etc.) and by the GI-cat extended interface. Specific components implement mediation services for interfacing heterogeneous service providers, each of which exposes a specific standard specification; such components are called Accessors. These mediating components solve providers data modelmultiplicity by mapping them onto the GI-cat internal data model which implements the ISO 19115 Core profile. Accessors also implement the query protocol mapping; first they translate the query requests expressed according to the interface protocols exposed by GI-cat into the multiple query dialects spoken by the resource service providers. Currently, a number of well-accepted catalog and inventory services are supported, including several OGC Web Services, THREDDS Data Server, SeaDataNet Common Data Index, GBIF and OpenSearch engines. A GeoNetwork powered GI-cat has been developed in order to exploit the best of the two frameworks. The new system uses a modified version of GeoNetwork web interface in order to add the capability of querying also the specified GI-cat catalog and not only the GeoNetwork internal database. The resulting system consists in a geoportal in which GI-cat plays the role of the search engine. This new system allows to distribute the query on the different types of data sources linked to a GI-cat. The metadata results of the query are then visualized by the Geonetwork web interface. This configuration was experimented in the framework of GIIDA, a project of the Italian National Research Council (CNR) focused on data accessibility and interoperability. A second advantage of this solution is achieved setting up a GeoNetwork catalog amongst the accessors of the GI-cat instance. Such a configuration will allow in turn GI-cat to run the query against the internal GeoNetwork database. This allows to have both the harvesting and the metadata editor functionalities provided by GeoNetwork and the distributed search functionality of GI-cat available in a consistent way through the same web interface.
An Application Programming Interface for Synthetic Snowflake Particle Structure and Scattering Data

NASA Technical Reports Server (NTRS)

Lammers, Matthew; Kuo, Kwo-Sen

2017-01-01

The work by Kuo and colleagues on growing synthetic snowflakes and calculating their single-scattering properties has demonstrated great potential to improve the retrievals of snowfall. To grant colleagues flexible and targeted access to their large collection of sizes and shapes at fifteen (15) microwave frequencies, we have developed a web-based Application Programming Interface (API) integrated with NASA Goddard's Precipitation Processing System (PPS) Group. It is our hope that the API will enable convenient programmatic utilization of the database. To help users better understand the API's capabilities, we have developed an interactive web interface called the OpenSSP API Query Builder, which implements an intuitive system of mechanisms for selecting shapes, sizes, and frequencies to generate queries, with which the API can then extract and return data from the database. The Query Builder also allows for the specification of normalized particle size distributions by setting pertinent parameters, with which the API can also return mean geometric and scattering properties for each size bin. Additionally, the Query Builder interface enables downloading of raw scattering and particle structure data packages. This presentation will describe some of the challenges and successes associated with developing such an API. Examples of its usage will be shown both through downloading output and pulling it into a spreadsheet, as well as querying the API programmatically and working with the output in code.
Analysis of Technique to Extract Data from the Web for Improved Performance

NASA Astrophysics Data System (ADS)

Gupta, Neena; Singh, Manish

2010-11-01

The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.
Analysis and Development of a Web-Enabled Planning and Scheduling Database Application

DTIC Science & Technology

2013-09-01

establishes an entity—relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web- enabled interface for...development, develop, design, process, re- engineering, reengineering, MySQL , structured query language, SQL, myPHPadmin. 15. NUMBER OF PAGES 107 16...relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web-enabled interface for the population of
OpenSearch technology for geospatial resources discovery

NASA Astrophysics Data System (ADS)

Papeschi, Fabrizio; Enrico, Boldrini; Mazzetti, Paolo

2010-05-01

In 2005, the term Web 2.0 has been coined by Tim O'Reilly to describe a quickly growing set of Web-based applications that share a common philosophy of "mutually maximizing collective intelligence and added value for each participant by formalized and dynamic information sharing". Around this same period, OpenSearch a new Web 2.0 technology, was developed. More properly, OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. Due to its strong impact on the way the Web is perceived by users and also due its relevance for businesses, Web 2.0 has attracted the attention of both mass media and the scientific community. This explosive growth in popularity of Web 2.0 technologies like OpenSearch, and practical applications of Service Oriented Architecture (SOA) resulted in an increased interest in similarities, convergence, and a potential synergy of these two concepts. SOA is considered as the philosophy of encapsulating application logic in services with a uniformly defined interface and making these publicly available via discovery mechanisms. Service consumers may then retrieve these services, compose and use them according to their current needs. A great degree of similarity between SOA and Web 2.0 may be leading to a convergence between the two paradigms. They also expose divergent elements, such as the Web 2.0 support to the human interaction in opposition to the typical SOA machine-to-machine interaction. According to these considerations, the Geospatial Information (GI) domain, is also moving first steps towards a new approach of data publishing and discovering, in particular taking advantage of the OpenSearch technology. A specific GI niche is represented by the OGC Catalog Service for Web (CSW) that is part of the OGC Web Services (OWS) specifications suite, which provides a set of services for discovery, access, and processing of geospatial resources in a SOA framework. GI-cat is a distributed CSW framework implementation developed by the ESSI Lab of the Italian National Research Council (CNR-IMAA) and the University of Florence. It provides brokering and mediation functionalities towards heterogeneous resources and inventories, exposing several standard interfaces for query distribution. This work focuses on a new GI-cat interface which allows the catalog to be queried according to the OpenSearch syntax specification, thus filling the gap between the SOA architectural design of the CSW and the Web 2.0. At the moment, there is no OGC standard specification about this topic, but an official change request has been proposed in order to enable the OGC catalogues to support OpenSearch queries. In this change request, an OpenSearch extension is proposed providing a standard mechanism to query a resource based on temporal and geographic extents. Two new catalog operations are also proposed, in order to publish a suitable OpenSearch interface. This extended interface is implemented by the modular GI-cat architecture adding a new profiling module called "OpenSearch profiler". Since GI-cat also acts as a clearinghouse catalog, another component called "OpenSearch accessor" is added in order to access OpenSearch compliant services. An important role in the GI-cat extension, is played by the adopted mapping strategy. Two different kind of mappings are required: query, and response elements mapping. Query mapping is provided in order to fit the simple OpenSearch query syntax to the complex CSW query expressed by the OGC Filter syntax. GI-cat internal data model is based on the ISO-19115 profile, that is more complex than the simple XML syndication formats, such as RSS 2.0 and Atom 1.0, suggested by OpenSearch. Once response elements are available, in order to be presented, they need to be translated from the GI-cat internal data model, to the above mentioned syndication formats; the mapping processing, is bidirectional. When GI-cat is used to access OpenSearch compliant services, the CSW query must be mapped to the OpenSearch query, and the response elements, must be translated according to the GI-cat internal data model. As results of such extensions, GI-cat provides a user friendly facade to the complex CSW interface, thus enabling it to be queried, for example, using a browser toolbar.
JetWeb: A WWW interface and database for Monte Carlo tuning and validation

NASA Astrophysics Data System (ADS)

Butterworth, J. M.; Butterworth, S.

2003-06-01

A World Wide Web interface to a Monte Carlo tuning facility is described. The aim of the package is to allow rapid and reproducible comparisons to be made between detailed measurements at high-energy physics colliders and general physics simulation packages. The package includes a relational database, a Java servlet query and display facility, and clean interfaces to simulation packages and their parameters.
SPARQL Assist language-neutral query composer

PubMed Central

2012-01-01

Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327
SPARQL assist language-neutral query composer.

PubMed

McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

2012-01-25

SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.
XGI: a graphical interface for XQuery creation.

PubMed

Li, Xiang; Gennari, John H; Brinkley, James F

2007-10-11

XML has become the default standard for data exchange among heterogeneous data sources, and in January 2007 XQuery (XML Query language) was recommended by the World Wide Web Consortium as the query language for XML. However, XQuery is a complex language that is difficult for non-programmers to learn. We have therefore developed XGI (XQuery Graphical Interface), a visual interface for graphically generating XQuery. In this paper we demonstrate the functionality of XGI through its application to a biomedical XML dataset. We describe the system architecture and the features of XGI in relation to several existing querying systems, we demonstrate the system's usability through a sample query construction, and we discuss a preliminary evaluation of XGI. Finally, we describe some limitations of the system, and our plans for future improvements.

Graph Mining Meets the Semantic Web

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan

The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less
Development of XML Schema for Broadband Digital Seismograms and Data Center Portal

NASA Astrophysics Data System (ADS)

Takeuchi, N.; Tsuboi, S.; Ishihara, Y.; Nagao, H.; Yamagishi, Y.; Watanabe, T.; Yanaka, H.; Yamaji, H.

2008-12-01

There are a number of data centers around the globe, where the digital broadband seismograms are opened to researchers. Those centers use their own user interfaces and there are no standard to access and retrieve seismograms from different data centers using unified interface. One of the emergent technologies to realize unified user interface for different data centers is the concept of WebService and WebService portal. Here we have developed a prototype of data center portal for digital broadband seismograms. This WebService portal uses WSDL (Web Services Description Language) to accommodate differences among the different data centers. By using the WSDL, alteration and addition of data center user interfaces can be easily managed. This portal, called NINJA Portal, assumes three WebServices: (1) database Query service, (2) Seismic event data request service, and (3) Seismic continuous data request service. Current system supports both station search of database Query service and seismic continuous data request service. Data centers supported by this NINJA portal will be OHP data center in ERI and Pacific21 data center in IFREE/JAMSTEC in the beginning. We have developed metadata standard for seismological data based on QuakeML for parametric data, which has been developed by ETH Zurich, and XML-SEED for waveform data, which was developed by IFREE/JAMSTEC. The prototype of NINJA portal is now released through IFREE web page (http://www.jamstec.go.jp/pacific21/).
GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus

PubMed Central

Zhu, Yuelin; Davis, Sean; Stephens, Robert; Meltzer, Paul S.; Chen, Yidong

2008-01-01

The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. Availability: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website. Contact: yidong@mail.nih.gov PMID:18842599
Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

NASA Astrophysics Data System (ADS)

Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

2009-12-01

The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.
A Tutorial in Creating Web-Enabled Databases with Inmagic DB/TextWorks through ODBC.

ERIC Educational Resources Information Center

Breeding, Marshall

2000-01-01

Explains how to create Web-enabled databases. Highlights include Inmagic's DB/Text WebPublisher product called DB/TextWorks; ODBC (Open Database Connectivity) drivers; Perl programming language; HTML coding; Structured Query Language (SQL); Common Gateway Interface (CGI) programming; and examples of HTML pages and Perl scripts. (LRW)
A new relational database structure and online interface for the HITRAN database

NASA Astrophysics Data System (ADS)

Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

2013-11-01

A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.
Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

PubMed Central

Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

2017-01-01

Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503
BioCarian: search engine for exploratory searches in heterogeneous biological databases.

PubMed

Zaki, Nazar; Tennakoon, Chandana

2017-10-02

There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
PiCO QL: A software library for runtime interactive queries on program data

NASA Astrophysics Data System (ADS)

Fragkoulis, Marios; Spinellis, Diomidis; Louridas, Panos

PiCO QL is an open source C/C++ software whose scientific scope is real-time interactive analysis of in-memory data through SQL queries. It exposes a relational view of a system's or application's data structures, which is queryable through SQL. While the application or system is executing, users can input queries through a web-based interface or issue web service requests. Queries execute on the live data structures through the respective relational views. PiCO QL makes a good candidate for ad-hoc data analysis in applications and for diagnostics in systems settings. Applications of PiCO QL include the Linux kernel, the Valgrind instrumentation framework, a GIS application, a virtual real-time observatory of stellar objects, and a source code analyser.
Metadata tables to enable dynamic data modeling and web interface design: the SEER example.

PubMed

Weiner, Mark; Sherr, Micah; Cohen, Abigail

2002-04-01

A wealth of information addressing health status, outcomes and resource utilization is compiled and made available by various government agencies. While exploration of the data is possible using existing tools, in general, would-be users of the resources must acquire CD-ROMs or download data from the web, and upload the data into their own database. Where web interfaces exist, they are highly structured, limiting the kinds of queries that can be executed. This work develops a web-based database interface engine whose content and structure is generated through interaction with a metadata table. The result is a dynamically generated web interface that can easily accommodate changes in the underlying data model by altering the metadata table, rather than requiring changes to the interface code. This paper discusses the background and implementation of the metadata table and web-based front end and provides examples of its use with the NCI's Surveillance, Epidemiology and End-Results (SEER) database.
YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface

PubMed Central

Abdulrehman, Dário; Monteiro, Pedro Tiago; Teixeira, Miguel Cacho; Mira, Nuno Pereira; Lourenço, Artur Bastos; dos Santos, Sandra Costa; Cabrito, Tânia Rodrigues; Francisco, Alexandre Paulo; Madeira, Sara Cordeiro; Aires, Ricardo Santos; Oliveira, Arlindo Limede; Sá-Correia, Isabel; Freitas, Ana Teresa

2011-01-01

The YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT) information system (http://www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2010, this database contains over 48 200 regulatory associations between transcription factors (TFs) and target genes, including 298 specific DNA-binding sites for 110 characterized TFs. All regulatory associations stored in the database were revisited and detailed information on the experimental evidences that sustain those associations was added and classified as direct or indirect evidences. The inclusion of this new data, gathered in response to the requests of YEASTRACT users, allows the user to restrict its queries to subsets of the data based on the existence or not of experimental evidences for the direct action of the TFs in the promoter region of their target genes. Another new feature of this release is the availability of all data through a machine readable web-service interface. Users are no longer restricted to the set of available queries made available through the existing web interface, and can use the web service interface to query, retrieve and exploit the YEASTRACT data using their own implementation of additional functionalities. The YEASTRACT information system is further complemented with several computational tools that facilitate the use of the curated data when answering a number of important biological questions. Since its first release in 2006, YEASTRACT has been extensively used by hundreds of researchers from all over the world. We expect that by making the new data and services available, the system will continue to be instrumental for yeast biologists and systems biology researchers. PMID:20972212
Construction of a Linux based chemical and biological information system.

PubMed

Molnár, László; Vágó, István; Fehér, András

2003-01-01

A chemical and biological information system with a Web-based easy-to-use interface and corresponding databases has been developed. The constructed system incorporates all chemical, numerical and textual data related to the chemical compounds, including numerical biological screen results. Users can search the database by traditional textual/numerical and/or substructure or similarity queries through the web interface. To build our chemical database management system, we utilized existing IT components such as ORACLE or Tripos SYBYL for database management and Zope application server for the web interface. We chose Linux as the main platform, however, almost every component can be used under various operating systems.
Moby and Moby 2: creatures of the deep (web).

PubMed

Vandervalk, Ben P; McCarthy, E Luke; Wilkinson, Mark D

2009-03-01

Facile and meaningful integration of data from disparate resources is the 'holy grail' of bioinformatics. Some resources have begun to address this problem by providing their data using Semantic Web standards, specifically the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Unfortunately, adoption of Semantic Web standards has been slow overall, and even in cases where the standards are being utilized, interconnectivity between resources is rare. In response, we have seen the emergence of centralized 'semantic warehouses' that collect public data from third parties, integrate it, translate it into OWL/RDF and provide it to the community as a unified and queryable resource. One limitation of the warehouse approach is that queries are confined to the resources that have been selected for inclusion. A related problem, perhaps of greater concern, is that the majority of bioinformatics data exists in the 'Deep Web'-that is, the data does not exist until an application or analytical tool is invoked, and therefore does not have a predictable Web address. The inability to utilize Uniform Resource Identifiers (URIs) to address this data is a barrier to its accessibility via URI-centric Semantic Web technologies. Here we examine 'The State of the Union' for the adoption of Semantic Web standards in the health care and life sciences domain by key bioinformatics resources, explore the nature and connectivity of several community-driven semantic warehousing projects, and report on our own progress with the CardioSHARE/Moby-2 project, which aims to make the resources of the Deep Web transparently accessible through SPARQL queries.
Magnetic Fields for All: The GPIPS Community Web-Access Portal

NASA Astrophysics Data System (ADS)

Carveth, Carol; Clemens, D. P.; Pinnick, A.; Pavel, M.; Jameson, K.; Taylor, B.

2007-12-01

The new GPIPS website portal provides community users with an intuitive and powerful interface to query the data products of the Galactic Plane Infrared Polarization Survey. The website, which was built using PHP for the front end and MySQL for the database back end, allows users to issue queries based on galactic or equatorial coordinates, GPIPS-specific identifiers, polarization information, magnitude information, and several other attributes. The returns are presented in HTML tables, with the added option of either downloading or being emailed an ASCII file including the same or more information from the database. Other functionalities of the website include providing details of the status of the Survey (which fields have been observed or are planned to be observed), techniques involved in data collection and analysis, and descriptions of the database contents and names. For this initial launch of the website, users may access the GPIPS polarization point source catalog and the deep coadd photometric point source catalog. Future planned developments include a graphics-based method for querying the database, as well as tools to combine neighboring GPIPS images into larger image files for both polarimetry and photometry. This work is partially supported by NSF grant AST-0607500.
World Wide Web Metaphors for Search Mission Data

NASA Technical Reports Server (NTRS)

Norris, Jeffrey S.; Wallick, Michael N.; Joswig, Joseph C.; Powell, Mark W.; Torres, Recaredo J.; Mittman, David S.; Abramyan, Lucy; Crockett, Thomas M.; Shams, Khawaja S.; Fox, Jason M.;

2010-01-01

A software program that searches and browses mission data emulates a Web browser, containing standard meta - phors for Web browsing. By taking advantage of back-end URLs, users may save and share search states. Also, since a Web interface is familiar to users, training time is reduced. Familiar back and forward buttons move through a local search history. A refresh/reload button regenerates a query, and loads in any new data. URLs can be constructed to save search results. Adding context to the current search is also handled through a familiar Web metaphor. The query is constructed by clicking on hyperlinks that represent new components to the search query. The selection of a link appears to the user as a page change; the choice of links changes to represent the updated search and the results are filtered by the new criteria. Selecting a navigation link changes the current query and also the URL that is associated with it. The back button can be used to return to the previous search state. This software is part of the MSLICE release, which was written in Java. It will run on any current Windows, Macintosh, or Linux system.

Do-It-Yourself: A Special Library's Approach to Creating Dynamic Web Pages Using Commercial Off-The-Shelf Applications

NASA Technical Reports Server (NTRS)

Steeman, Gerald; Connell, Christopher

2000-01-01

Many librarians may feel that dynamic Web pages are out of their reach, financially and technically. Yet we are reminded in library and Web design literature that static home pages are a thing of the past. This paper describes how librarians at the Institute for Defense Analyses (IDA) library developed a database-driven, dynamic intranet site using commercial off-the-shelf applications. Administrative issues include surveying a library users group for interest and needs evaluation; outlining metadata elements; and, committing resources from managing time to populate the database and training in Microsoft FrontPage and Web-to-database design. Technical issues covered include Microsoft Access database fundamentals, lessons learned in the Web-to-database process (including setting up Database Source Names (DSNs), redesigning queries to accommodate the Web interface, and understanding Access 97 query language vs. Standard Query Language (SQL)). This paper also offers tips on editing Active Server Pages (ASP) scripting to create desired results. A how-to annotated resource list closes out the paper.
Retrieving high-resolution images over the Internet from an anatomical image database

NASA Astrophysics Data System (ADS)

Strupp-Adams, Annette; Henderson, Earl

1999-12-01

The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.
EarthServer: Use of Rasdaman as a data store for use in visualisation of complex EO data

NASA Astrophysics Data System (ADS)

Clements, Oliver; Walker, Peter; Grant, Mike

2013-04-01

The European Commission FP7 project EarthServer is establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending cutting-edge Array Database technology. EarthServer is built around the Rasdaman Raster Data Manager which extends standard relational database systems with the ability to store and retrieve multi-dimensional raster data of unlimited size through an SQL style query language. Rasdaman facilitates visualisation of data by providing several Open Geospatial Consortium (OGC) standard interfaces through its web services wrapper, Petascope. These include the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. Here we will show that the use of EarthServer technologies and infrastructure allows access and visualisation of massive scale data through a web client with only marginal bandwidth use as opposed to the current mechanism of copying huge amounts of data to create visualisations locally. For example if a user wanted to generate a plot of global average chlorophyll for a complete decade time series they would only have to download the result instead of Terabytes of data. Firstly we will present a brief overview of the capabilities of Rasdaman and the WCPS query language to introduce the ways in which it is used in a visualisation tool chain. We will show that there are several ways in which WCPS can be utilised to create both standard and novel web based visualisations. An example of a standard visualisation is the production of traditional 2d plots, allowing users the ability to plot data products easily. However, the query language allows the creation of novel/custom products, which can then immediately be plotted with the same system. For more complex multi-spectral data, WCPS allows the user to explore novel combinations of bands in standard band-ratio algorithms through a web browser with dynamic updating of the resultant image. To visualise very large datasets Rasdaman has the capability to dynamically scale a dataset or query result so that it can be appraised quickly for use in later unscaled queries. All of these techniques are accessible through a web based GIS interface increasing the number of potential users of the system. Lastly we will show the advances in dynamic web based 3D visualisations being explored within the EarthServer project. By utilising the emerging declarative 3D web standard X3DOM as a tool to visualise the results of WCPS queries we introduce several possible benefits, including quick appraisal of data for outliers or anomalous data points and visualisation of the uncertainty of data alongside the actual data values.
Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

PubMed

Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

2017-01-04

Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

NASA Astrophysics Data System (ADS)

Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

2002-12-01

We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.

SAFOD Brittle Microstructure and Mechanics Knowledge Base (BM2KB)

NASA Astrophysics Data System (ADS)

Babaie, Hassan A.; Broda Cindi, M.; Hadizadeh, Jafar; Kumar, Anuj

2013-07-01

Scientific drilling near Parkfield, California has established the San Andreas Fault Observatory at Depth (SAFOD), which provides the solid earth community with short range geophysical and fault zone material data. The BM2KB ontology was developed in order to formalize the knowledge about brittle microstructures in the fault rocks sampled from the SAFOD cores. A knowledge base, instantiated from this domain ontology, stores and presents the observed microstructural and analytical data with respect to implications for brittle deformation and mechanics of faulting. These data can be searched on the knowledge base‧s Web interface by selecting a set of terms (classes, properties) from different drop-down lists that are dynamically populated from the ontology. In addition to this general search, a query can also be conducted to view data contributed by a specific investigator. A search by sample is done using the EarthScope SAFOD Core Viewer that allows a user to locate samples on high resolution images of core sections belonging to different runs and holes. The class hierarchy of the BM2KB ontology was initially designed using the Unified Modeling Language (UML), which was used as a visual guide to develop the ontology in OWL applying the Protégé ontology editor. Various Semantic Web technologies such as the RDF, RDFS, and OWL ontology languages, SPARQL query language, and Pellet reasoning engine, were used to develop the ontology. An interactive Web application interface was developed through Jena, a java based framework, with AJAX technology, jsp pages, and java servlets, and deployed via an Apache tomcat server. The interface allows the registered user to submit data related to their research on a sample of the SAFOD core. The submitted data, after initial review by the knowledge base administrator, are added to the extensible knowledge base and become available in subsequent queries to all types of users. The interface facilitates inference capabilities in the ontology, supports SPARQL queries, allows for modifications based on successive discoveries, and provides an accessible knowledge base on the Web.
Multimedia data repository for the World Wide Web

NASA Astrophysics Data System (ADS)

Chen, Ken; Lu, Dajin; Xu, Duanyi

1998-08-01

This paper introduces the design and implementation of a Multimedia Data Repository served as a multimedia information system, which provides users a Web accessible, platform independent interface to query, browse, and retrieve multimedia data such as images, graphics, audio, video from a large multimedia data repository. By integrating the multimedia DBMS, in which the textual information and samples of the multimedia data is organized and stored, and Web server together into the Microsoft ActiveX Server Framework, users can access the DBMS and query the information by simply using a Web browser at the client-side. The original multimedia data can then be located and transmitted through the Internet from the tertiary storage device, a 400 CDROM optical jukebox at the server-side, to the client-side for further use.
HomPPI: a class of sequence homology based protein-protein interface prediction methods

PubMed Central

2011-01-01

Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
Relational databases: a transparent framework for encouraging biology students to think informatically.

PubMed

Rice, Michael; Gladstone, William; Weir, Michael

2004-01-01

We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills.
Relational Databases: A Transparent Framework for Encouraging Biology Students To Think Informatically

PubMed Central

2004-01-01

We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills. PMID:15592597
The Protein Disease Database of human body fluids: II. Computer methods and data issues.

PubMed

Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

1995-01-01

The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.
A journey to Semantic Web query federation in the life sciences.

PubMed

Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

2009-10-01

As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
A journey to Semantic Web query federation in the life sciences

PubMed Central

Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

2009-01-01

Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394
A web-based platform for virtual screening.

PubMed

Watson, Paul; Verdonk, Marcel; Hartshorn, Michael J

2003-09-01

A fully integrated, web-based, virtual screening platform has been developed to allow rapid virtual screening of large numbers of compounds. ORACLE is used to store information at all stages of the process. The system includes a large database of historical compounds from high throughput screenings (HTS) chemical suppliers, ATLAS, containing over 3.1 million unique compounds with their associated physiochemical properties (ClogP, MW, etc.). The database can be screened using a web-based interface to produce compound subsets for virtual screening or virtual library (VL) enumeration. In order to carry out the latter task within ORACLE a reaction data cartridge has been developed. Virtual libraries can be enumerated rapidly using the web-based interface to the cartridge. The compound subsets can be seamlessly submitted for virtual screening experiments, and the results can be viewed via another web-based interface allowing ad hoc querying of the virtual screening data stored in ORACLE.
ASIST 2003: Part III: Posters.

ERIC Educational Resources Information Center

Proceedings of the ASIST Annual Meeting, 2003

2003-01-01

Twenty-three posters address topics including access to information; metadata; personal information management; scholarly information communication; online resources; content analysis; interfaces; Web queries; information evaluation; informatics; information needs; search effectiveness; digital libraries; diversity; automated indexing; e-commerce;…
The IRIS Federator: Accessing Seismological Data Across Data Centers

NASA Astrophysics Data System (ADS)

Trabant, C. M.; Van Fossen, M.; Ahern, T. K.; Weekly, R. T.

2015-12-01

In 2013 the International Federation of Digital Seismograph Networks (FDSN) approved a specification for web service interfaces for accessing seismological station metadata, time series and event parameters. Since then, a number of seismological data centers have implemented FDSN service interfaces, with more implementations in development. We have developed a new system called the IRIS Federator which leverages this standardization and provides the scientific community with a service for easy discovery and access of seismological data across FDSN data centers. These centers are located throughout the world and this work represents one model of a system for data collection across geographic and political boundaries.The main components of the IRIS Federator are a catalog of time series metadata holdings at each data center and a web service interface for searching the catalog. The service interface is designed to support client-side federated data access, a model in which the client (software run by the user) queries the catalog and then collects the data from each identified center. By default the results are returned in a format suitable for direct submission to those web services, but could also be formatted in a simple text format for general data discovery purposes. The interface will remove any duplication of time series channels between data centers according to a set of business rules by default, however a user may request results with all duplicate time series entries included. We will demonstrate how client-side federation is being incorporated into some of the DMC's data access tools. We anticipate further enhancement of the IRIS Federator to improve data discovery in various scenarios and to improve usefulness to communities beyond seismology.Data centers with FDSN web services: http://www.fdsn.org/webservices/The IRIS Federator query interface: http://service.iris.edu/irisws/fedcatalog/1/
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gwyn, Stephen D. J., E-mail: Stephen.Gwyn@nrc-cnrc.gc.ca

This paper describes the image stacks and catalogs of the Canada-France-Hawaii Telescope Legacy Survey produced using the MegaPipe data pipeline at the Canadian Astronomy Data Centre. The Legacy Survey is divided into two parts. The Deep Survey consists of four fields each of 1 deg{sup 2}, with magnitude limits (50% completeness for point sources) of u = 27.5, g = 27.9, r = 27.7, i = 27.4, and z = 26.2. It contains 1.6 Multiplication-Sign 10{sup 6} sources. The Wide Survey consists of 150 deg{sup 2} split over four fields, with magnitude limits of u = 26.0, g = 26.5,more » r = 25.9, i = 25.7, and z = 24.6. It contains 3 Multiplication-Sign 10{sup 7} sources. This paper describes the calibration, image stacking, and catalog generation process. The images and catalogs are available on the web through several interfaces: normal image and text file catalog downloads, a 'Google Sky' interface, an image cutout service, and a catalog database query service.« less
Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

PubMed

Kim, Sun; Yeganova, Lana; Wilbur, W John

2016-10-01

Medical Subject Headings (MeSH(®)) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed(®) for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries. https://www.ncbi.nlm.nih.gov/IRET/MESHABLE/ CONTACT: sun.kim@nih.gov Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

PubMed

Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

2013-10-08

Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.
Research and development of web oriented remote sensing image publication system based on Servlet technique

NASA Astrophysics Data System (ADS)

Juanle, Wang; Shuang, Li; Yunqiang, Zhu

2005-10-01

According to the requirements of China National Scientific Data Sharing Program (NSDSP), the research and development of web oriented RS Image Publication System (RSIPS) is based on Java Servlet technique. The designing of RSIPS framework is composed of 3 tiers, which is Presentation Tier, Application Service Tier and Data Resource Tier. Presentation Tier provides user interface for data query, review and download. For the convenience of users, visual spatial query interface is included. Served as a middle tier, Application Service Tier controls all actions between users and databases. Data Resources Tier stores RS images in file and relationship databases. RSIPS is developed with cross platform programming based on Java Servlet tools, which is one of advanced techniques in J2EE architecture. RSIPS's prototype has been developed and applied in the geosciences clearinghouse practice which is among the experiment units of NSDSP in China.
Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal

PubMed Central

Gao, Jianjiong; Aksoy, Bülent Arman; Dogrusoz, Ugur; Dresdner, Gideon; Gross, Benjamin; Sumer, S. Onur; Sun, Yichao; Jacobsen, Anders; Sinha, Rileen; Larsson, Erik; Cerami, Ethan; Sander, Chris; Schultz, Nikolaus

2014-01-01

The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. PMID:23550210
BioSWR – Semantic Web Services Registry for Bioinformatics

PubMed Central

Repchevsky, Dmitry; Gelpi, Josep Ll.

2014-01-01

Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license. PMID:25233118
BioSWR--semantic web services registry for bioinformatics.

PubMed

Repchevsky, Dmitry; Gelpi, Josep Ll

2014-01-01

Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license.
cPath: open source software for collecting, storing, and querying biological pathways.

PubMed

Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

2006-11-13

Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.
mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences

PubMed Central

Lee, Hwan Young; Song, Injee; Ha, Eunho; Cho, Sung-Bae; Yang, Woo Ick; Shin, Kyoung-Jin

2008-01-01

Background For the past few years, scientific controversy has surrounded the large number of errors in forensic and literature mitochondrial DNA (mtDNA) data. However, recent research has shown that using mtDNA phylogeny and referring to known mtDNA haplotypes can be useful for checking the quality of sequence data. Results We developed a Web-based bioinformatics resource "mtDNAmanager" that offers a convenient interface supporting the management and quality analysis of mtDNA sequence data. The mtDNAmanager performs computations on mtDNA control-region sequences to estimate the most-probable mtDNA haplogroups and retrieves similar sequences from a selected database. By the phased designation of the most-probable haplogroups (both expected and estimated haplogroups), mtDNAmanager enables users to systematically detect errors whilst allowing for confirmation of the presence of clear key diagnostic mutations and accompanying mutations. The query tools of mtDNAmanager also facilitate database screening with two options of "match" and "include the queried nucleotide polymorphism". In addition, mtDNAmanager provides Web interfaces for users to manage and analyse their own data in batch mode. Conclusion The mtDNAmanager will provide systematic routines for mtDNA sequence data management and analysis via easily accessible Web interfaces, and thus should be very useful for population, medical and forensic studies that employ mtDNA analysis. mtDNAmanager can be accessed at . PMID:19014619

The Geodetic Seamless Archive Centers Service Layer: A System Architecture for Federating Geodesy Data Repositories

NASA Astrophysics Data System (ADS)

McWhirter, J.; Boler, F. M.; Bock, Y.; Jamason, P.; Squibb, M. B.; Noll, C. E.; Blewitt, G.; Kreemer, C. W.

2010-12-01

Three geodesy Archive Centers, Scripps Orbit and Permanent Array Center (SOPAC), NASA's Crustal Dynamics Data Information System (CDDIS) and UNAVCO are engaged in a joint effort to define and develop a common Web Service Application Programming Interface (API) for accessing geodetic data holdings. This effort is funded by the NASA ROSES ACCESS Program to modernize the original GPS Seamless Archive Centers (GSAC) technology which was developed in the 1990s. A new web service interface, the GSAC-WS, is being developed to provide uniform and expanded mechanisms through which users can access our data repositories. In total, our respective archives hold tens of millions of files and contain a rich collection of site/station metadata. Though we serve similar user communities, we currently provide a range of different access methods, query services and metadata formats. This leads to a lack of consistency in the userís experience and a duplication of engineering efforts. The GSAC-WS API and its reference implementation in an underlying Java-based GSAC Service Layer (GSL) supports metadata and data queries into site/station oriented data archives. The general nature of this API makes it applicable to a broad range of data systems. The overall goals of this project include providing consistent and rich query interfaces for end users and client programs, the development of enabling technology to facilitate third party repositories in developing these web service capabilities and to enable the ability to perform data queries across a collection of federated GSAC-WS enabled repositories. A fundamental challenge faced in this project is to provide a common suite of query services across a heterogeneous collection of data yet enabling each repository to expose their specific metadata holdings. To address this challenge we are developing a "capabilities" based service where a repository can describe its specific query and metadata capabilities. Furthermore, the architecture of the GSL is based on a model-view paradigm that decouples the underlying data model semantics from particular representations of the data model. This will allow for the GSAC-WS enabled repositories to evolve their service offerings to incorporate new metadata definition formats (e.g., ISO-19115, FGDC, JSON, etc.) and new techniques for accessing their holdings. Building on the core GSAC-WS implementations the project is also developing a federated/distributed query service. This service will seamlessly integrate with the GSAC Service Layer and will support data and metadata queries across a collection of federated GSAC repositories.
Reactome Pengine: A web-logic API to the homo sapiens reactome.

PubMed

Neaves, Samuel R; Tsoka, Sophia; Millard, Louise A C

2018-03-30

Existing ways of accessing data from the Reactome database are limited. Either a researcher is restricted to particular queries defined by a web application programming interface (API), or they have to download the whole database. Reactome Pengine is a web service providing a logic programming based API to the human reactome. This gives researchers greater flexibility in data access than existing APIs, as users can send their own small programs (alongside queries) to Reactome Pengine. The server and an example notebook can be found at https://apps.nms.kcl.ac.uk/reactome-pengine. Source code is available at https://github.com/samwalrus/reactome-pengine and a Docker image is available at https://hub.docker.com/r/samneaves/rp4/ . samuel.neaves@kcl.ac.uk. Supplementary data are available at Bioinformatics online.
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

PubMed

Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

2018-03-01

In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Accessing the public MIMIC-II intensive care relational database for clinical research.

PubMed

Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G

2013-01-10

The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.
Transparent mediation-based access to multiple yeast data sources using an ontology driven interface.

PubMed

Briache, Abdelaali; Marrakchi, Kamar; Kerzazi, Amine; Navas-Delgado, Ismael; Rossi Hassani, Badr D; Lairini, Khalid; Aldana-Montes, José F

2012-01-25

Saccharomyces cerevisiae is recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Information solicited by scientists on its biological entities (Proteins, Genes, RNAs...) is scattered within several data sources like SGD, Yeastract, CYGD-MIPS, BioGrid, PhosphoGrid, etc. Because of the heterogeneity of these sources, querying them separately and then manually combining the returned results is a complex and time-consuming task for biologists most of whom are not bioinformatics expert. It also reduces and limits the use that can be made on the available data. To provide transparent and simultaneous access to yeast sources, we have developed YeastMed: an XML and mediator-based system. In this paper, we present our approach in developing this system which takes advantage of SB-KOM to perform the query transformation needed and a set of Data Services to reach the integrated data sources. The system is composed of a set of modules that depend heavily on XML and Semantic Web technologies. User queries are expressed in terms of a domain ontology through a simple form-based web interface. YeastMed is the first mediation-based system specific for integrating yeast data sources. It was conceived mainly to help biologists to find simultaneously relevant data from multiple data sources. It has a biologist-friendly interface easy to use. The system is available at http://www.khaos.uma.es/yeastmed/.
Embedding the Form Generator in a Content Management System

NASA Astrophysics Data System (ADS)

Delgado, A.; Wicenec, A.; Delmotte, N.; Tejero, A.

2008-08-01

Given the tremendous amount of data generated by ESO's telescopes and the rapid evolution of the World Wide Web, the ESO archive web interface needs to offer more flexible services and advanced functionalities to a growing community of users all over the world. To achieve this endeavour, a query form generator is being developed inside a Content Management System. We present here a progress report.
cPath: open source software for collecting, storing, and querying biological pathways

PubMed Central

Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

2006-01-01

Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling. PMID:17101041
Open Technology Approaches to Geospatial Interface Design

NASA Astrophysics Data System (ADS)

Crevensten, B.; Simmons, D.; Alaska Satellite Facility

2011-12-01

What problems do you not want your software developers to be solving? Choosing open technologies across the entire stack of software development-from low-level shared libraries to high-level user interaction implementations-is a way to help ensure that customized software yields innovative and valuable tools for Earth Scientists. This demonstration will review developments in web application technologies and the recurring patterns of interaction design regarding exploration and discovery of geospatial data through the Vertex: ASF's Dataportal interface, a project utilizing current open web application standards and technologies including HTML5, jQueryUI, Backbone.js and the Jasmine unit testing framework.
A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

PubMed Central

Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

2013-01-01

Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078
WeBIAS: a web server for publishing bioinformatics applications.

PubMed

Daniluk, Paweł; Wilczyński, Bartek; Lesyng, Bogdan

2015-11-02

One of the requirements for a successful scientific tool is its availability. Developing a functional web service, however, is usually considered a mundane and ungratifying task, and quite often neglected. When publishing bioinformatic applications, such attitude puts additional burden on the reviewers who have to cope with poorly designed interfaces in order to assess quality of presented methods, as well as impairs actual usefulness to the scientific community at large. In this note we present WeBIAS-a simple, self-contained solution to make command-line programs accessible through web forms. It comprises a web portal capable of serving several applications and backend schedulers which carry out computations. The server handles user registration and authentication, stores queries and results, and provides a convenient administrator interface. WeBIAS is implemented in Python and available under GNU Affero General Public License. It has been developed and tested on GNU/Linux compatible platforms covering a vast majority of operational WWW servers. Since it is written in pure Python, it should be easy to deploy also on all other platforms supporting Python (e.g. Windows, Mac OS X). Documentation and source code, as well as a demonstration site are available at http://bioinfo.imdik.pan.pl/webias . WeBIAS has been designed specifically with ease of installation and deployment of services in mind. Setting up a simple application requires minimal effort, yet it is possible to create visually appealing, feature-rich interfaces for query submission and presentation of results.
A study of medical and health queries to web search engines.

PubMed

Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

2004-03-01

This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Building a semi-automatic ontology learning and construction system for geosciences

NASA Astrophysics Data System (ADS)

Babaie, H. A.; Sunderraman, R.; Zhu, Y.

2013-12-01

We are developing an ontology learning and construction framework that allows continuous, semi-automatic knowledge extraction, verification, validation, and maintenance by potentially a very large group of collaborating domain experts in any geosciences field. The system brings geoscientists from the side-lines to the center stage of ontology building, allowing them to collaboratively construct and enrich new ontologies, and merge, align, and integrate existing ontologies and tools. These constantly evolving ontologies can more effectively address community's interests, purposes, tools, and change. The goal is to minimize the cost and time of building ontologies, and maximize the quality, usability, and adoption of ontologies by the community. Our system will be a domain-independent ontology learning framework that applies natural language processing, allowing users to enter their ontology in a semi-structured form, and a combined Semantic Web and Social Web approach that lets direct participation of geoscientists who have no skill in the design and development of their domain ontologies. A controlled natural language (CNL) interface and an integrated authoring and editing tool automatically convert syntactically correct CNL text into formal OWL constructs. The WebProtege-based system will allow a potentially large group of geoscientists, from multiple domains, to crowd source and participate in the structuring of their knowledge model by sharing their knowledge through critiquing, testing, verifying, adopting, and updating of the concept models (ontologies). We will use cloud storage for all data and knowledge base components of the system, such as users, domain ontologies, discussion forums, and semantic wikis that can be accessed and queried by geoscientists in each domain. We will use NoSQL databases such as MongoDB as a service in the cloud environment. MongoDB uses the lightweight JSON format, which makes it convenient and easy to build Web applications using just HTML5 and Javascript, thereby avoiding cumbersome server side coding present in the traditional approaches. The JSON format used in MongoDB is also suitable for storing and querying RDF data. We will store the domain ontologies and associated linked data in JSON/RDF formats. Our Web interface will be built upon the open source and configurable WebProtege ontology editor. We will develop a simplified mobile version of our user interface which will automatically detect the hosting device and adjust the user interface layout to accommodate different screen sizes. We will also use the Semantic Media Wiki that allows the user to store and query the data within the wiki pages. By using HTML 5, JavaScript, and WebGL, we aim to create an interactive, dynamic, and multi-dimensional user interface that presents various geosciences data sets in a natural and intuitive way.
Version VI of the ESTree db: an improved tool for peach transcriptome analysis

PubMed Central

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Merelli, Ivan; Barale, Francesca; Milanesi, Luciano; Stella, Alessandra; Pozzi, Carlo

2008-01-01

Background The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. Results A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. Conclusions The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention. PMID:18387211
Information System through ANIS at CeSAM

NASA Astrophysics Data System (ADS)

Moreau, C.; Agneray, F.; Gimenez, S.

2015-09-01

ANIS (AstroNomical Information System) is a web generic tool developed at CeSAM to facilitate and standardize the implementation of astronomical data of various kinds through private and/or public dedicated Information Systems. The architecture of ANIS is composed of a database server which contains the project data, a web user interface template which provides high level services (search, extract and display imaging and spectroscopic data using a combination of criteria, an object list, a sql query module or a cone search interfaces), a framework composed of several packages, and a metadata database managed by a web administration entity. The process to implement a new ANIS instance at CeSAM is easy and fast : the scientific project has to submit data or a data secure access, the CeSAM team installs the new instance (web interface template and the metadata database), and the project administrator can configure the instance with the web ANIS-administration entity. Currently, the CeSAM offers through ANIS a web access to VO compliant Information Systems for different projects (HeDaM, HST-COSMOS, CFHTLS-ZPhots, ExoDAT,...).
A comprehensive physiologically based pharmacokinetic knowledgebase and web-based interface for rapid model ranking and querying

EPA Science Inventory

Published physiologically based pharmacokinetic (PBPK) models from peer-reviewed articles are often well-parameterized, thoroughly-vetted, and can be utilized as excellent resources for the construction of models pertaining to related chemicals. Specifically, chemical-specific pa...
PROTICdb: a web-based application to store, track, query, and compare plant proteome data.

PubMed

Ferry-Dumazet, Hélène; Houel, Gwenn; Montalent, Pierre; Moreau, Luc; Langella, Olivier; Negroni, Luc; Vincent, Delphine; Lalanne, Céline; de Daruvar, Antoine; Plomion, Christophe; Zivy, Michel; Joets, Johann

2005-05-01

PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.
A Modular Framework for Transforming Structured Data into HTML with Machine-Readable Annotations

NASA Astrophysics Data System (ADS)

Patton, E. W.; West, P.; Rozell, E.; Zheng, J.

2010-12-01

There is a plethora of web-based Content Management Systems (CMS) available for maintaining projects and data, i.a. However, each system varies in its capabilities and often content is stored separately and accessed via non-uniform web interfaces. Moving from one CMS to another (e.g., MediaWiki to Drupal) can be cumbersome, especially if a large quantity of data must be adapted to the new system. To standardize the creation, display, management, and sharing of project information, we have assembled a framework that uses existing web technologies to transform data provided by any service that supports the SPARQL Protocol and RDF Query Language (SPARQL) queries into HTML fragments, allowing it to be embedded in any existing website. The framework utilizes a two-tier XML Stylesheet Transformation (XSLT) that uses existing ontologies (e.g., Friend-of-a-Friend, Dublin Core) to interpret query results and render them as HTML documents. These ontologies can be used in conjunction with custom ontologies suited to individual needs (e.g., domain-specific ontologies for describing data records). Furthermore, this transformation process encodes machine-readable annotations, namely, the Resource Description Framework in attributes (RDFa), into the resulting HTML, so that capable parsers and search engines can extract the relationships between entities (e.g, people, organizations, datasets). To facilitate editing of content, the framework provides a web-based form system, mapping each query to a dynamically generated form that can be used to modify and create entities, while keeping the native data store up-to-date. This open framework makes it easy to duplicate data across many different sites, allowing researchers to distribute their data in many different online forums. In this presentation we will outline the structure of queries and the stylesheets used to transform them, followed by a brief walkthrough that follows the data from storage to human- and machine-accessible web page. We conclude with a discussion on content caching and steps toward performing queries across multiple domains.
Accessing the public MIMIC-II intensive care relational database for clinical research

PubMed Central

2013-01-01

Background The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. Results QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”. Conclusions QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database. PMID:23302652
Omicseq: a web-based search engine for exploring omics datasets

PubMed Central

Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

2017-01-01

Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Web tools for effective retrieval, visualization, and evaluation of cardiology medical images and records

NASA Astrophysics Data System (ADS)

Masseroli, Marco; Pinciroli, Francesco

2000-12-01

To provide easy retrieval, integration and evaluation of multimodal cardiology images and data in a web browser environment, distributed application technologies and java programming were used to implement a client-server architecture based on software agents. The server side manages secure connections and queries to heterogeneous remote databases and file systems containing patient personal and clinical data. The client side is a Java applet running in a web browser and providing a friendly medical user interface to perform queries on patient and medical test dat and integrate and visualize properly the various query results. A set of tools based on Java Advanced Imaging API enables to process and analyze the retrieved cardiology images, and quantify their features in different regions of interest. The platform-independence Java technology makes the developed prototype easy to be managed in a centralized form and provided in each site where an intranet or internet connection can be located. Giving the healthcare providers effective tools for querying, visualizing and evaluating comprehensively cardiology medical images and records in all locations where they can need them- i.e. emergency, operating theaters, ward, or even outpatient clinics- the developed prototype represents an important aid in providing more efficient diagnoses and medical treatments.

Focused Crawling of the Deep Web Using Service Class Descriptions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rocco, D; Liu, L; Critchlow, T

2004-06-21

Dynamic Web data sources--sometimes known collectively as the Deep Web--increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web. To address thesemore » challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DynaBot has three unique characteristics. First, DynaBot utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular, self-tuning system architecture for focused crawling of the DeepWeb using service class descriptions. Third, DynaBot incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.« less
CFGP: a web-based, comparative fungal genomics platform.

PubMed

Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

2008-01-01

Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.
A Query Integrator and Manager for the Query Web

PubMed Central

Brinkley, James F.; Detwiler, Landon T.

2012-01-01

We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
The CMS DBS query language

NASA Astrophysics Data System (ADS)

Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

2010-04-01

The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Dynamic XML-based exchange of relational data: application to the Human Brain Project.

PubMed

Tang, Zhengming; Kadiyska, Yana; Li, Hao; Suciu, Dan; Brinkley, James F

2003-01-01

This paper discusses an approach to exporting relational data in XML format for data exchange over the web. We describe the first real-world application of SilkRoute, a middleware program that dynamically converts existing relational data to a user-defined XML DTD. The application, called XBrain, wraps SilkRoute in a Java Server Pages framework, thus permitting a web-based XQuery interface to a legacy relational database. The application is demonstrated as a query interface to the University of Washington Brain Project's Language Map Experiment Management System, which is used to manage data about language organization in the brain.
Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

PubMed Central

2010-01-01

Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu. PMID:20946609
Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

PubMed

Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

2010-10-07

Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.
Providing Multi-Page Data Extraction Services with XWRAPComposer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Ling; Zhang, Jianjun; Han, Wei

2008-04-30

Dynamic Web data sources – sometimes known collectively as the Deep Web – increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deepmore » Web. To address these challenges, we present DYNABOT, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DYNABOT has three unique characteristics. First, DYNABOT utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DYNABOT employs a modular, self-tuning system architecture for focused crawling of the Deep Web using service class descriptions. Third, DYNABOT incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.« less
Data Access System for Hydrology

NASA Astrophysics Data System (ADS)

Whitenack, T.; Zaslavsky, I.; Valentine, D.; Djokic, D.

2007-12-01

As part of the CUAHSI HIS (Consortium of Universities for the Advancement of Hydrologic Science, Inc., Hydrologic Information System), the CUAHSI HIS team has developed Data Access System for Hydrology or DASH. DASH is based on commercial off the shelf technology, which has been developed in conjunction with a commercial partner, ESRI. DASH is a web-based user interface, developed in ASP.NET developed using ESRI ArcGIS Server 9.2 that represents a mapping, querying and data retrieval interface over observation and GIS databases, and web services. This is the front end application for the CUAHSI Hydrologic Information System Server. The HIS Server is a software stack that organizes observation databases, geographic data layers, data importing and management tools, and online user interfaces such as the DASH application, into a flexible multi- tier application for serving both national-level and locally-maintained observation data. The user interface of the DASH web application allows online users to query observation networks by location and attributes, selecting stations in a user-specified area where a particular variable was measured during a given time interval. Once one or more stations and variables are selected, the user can retrieve and download the observation data for further off-line analysis. The DASH application is highly configurable. The mapping interface can be configured to display map services from multiple sources in multiple formats, including ArcGIS Server, ArcIMS, and WMS. The observation network data is configured in an XML file where you specify the network's web service location and its corresponding map layer. Upon initial deployment, two national level observation networks (USGS NWIS daily values and USGS NWIS Instantaneous values) are already pre-configured. There is also an optional login page which can be used to restrict access as well as providing a alternative to immediate downloads. For large request, users would be notified via email with a link to their data when it is ready.
Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search.

PubMed

Jay, Caroline; Harper, Simon; Dunlop, Ian; Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain

2016-01-14

Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these "experts." Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the "Google generation" than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is "Google-like," enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F1,19=37.3, P<.001), with a main effect of task (F3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F1,19=18.0, P<.001). There was also a main effect of task (F2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.
Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search

PubMed Central

Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain

2016-01-01

Background Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. Objective The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Methods Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Results Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F 1,19=37.3, P<.001), with a main effect of task (F 3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F 1,19=18.0, P<.001). There was also a main effect of task (F 2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. Conclusions The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation. PMID:26769334
Query optimization for graph analytics on linked data using SPARQL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan

2015-07-01

Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
TMFoldWeb: a web server for predicting transmembrane protein fold class.

PubMed

Kozma, Dániel; Tusnády, Gábor E

2015-09-17

Here we present TMFoldWeb, the web server implementation of TMFoldRec, a transmembrane protein fold recognition algorithm. TMFoldRec uses statistical potentials and utilizes topology filtering and a gapless threading algorithm. It ranks template structures and selects the most likely candidates and estimates the reliability of the obtained lowest energy model. The statistical potential was developed in a maximum likelihood framework on a representative set of the PDBTM database. According to the benchmark test the performance of TMFoldRec is about 77 % in correctly predicting fold class for a given transmembrane protein sequence. An intuitive web interface has been developed for the recently published TMFoldRec algorithm. The query sequence goes through a pipeline of topology prediction and a systematic sequence to structure alignment (threading). Resulting templates are ordered by energy and reliability values and are colored according to their significance level. Besides the graphical interface, a programmatic access is available as well, via a direct interface for developers or for submitting genome-wide data sets. The TMFoldWeb web server is unique and currently the only web server that is able to predict the fold class of transmembrane proteins while assigning reliability scores for the prediction. This method is prepared for genome-wide analysis with its easy-to-use interface, informative result page and programmatic access. Considering the info-communication evolution in the last few years, the developed web server, as well as the molecule viewer, is responsive and fully compatible with the prevalent tablets and mobile devices.
[Study on Information Extraction of Clinic Expert Information from Hospital Portals].

PubMed

Zhang, Yuanpeng; Dong, Jiancheng; Qian, Danmin; Geng, Xingyun; Wu, Huiqun; Wang, Li

2015-12-01

Clinic expert information provides important references for residents in need of hospital care. Usually, such information is hidden in the deep web and cannot be directly indexed by search engines. To extract clinic expert information from the deep web, the first challenge is to make a judgment on forms. This paper proposes a novel method based on a domain model, which is a tree structure constructed by the attributes of search interfaces. With this model, search interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from the returned web pages indexed by search interfaces. To filter the noise information on a web page, a block importance model is proposed. The experiment results indicated that the domain model yielded a precision 10.83% higher than that of the rule-based method, whereas the block importance model yielded an F₁ measure 10.5% higher than that of the XPath method.
SWS: accessing SRS sites contents through Web Services.

PubMed

Romano, Paolo; Marra, Domenico

2008-03-26

Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis. New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services. We have developed 'SRS by WS' (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems. Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.
Omicseq: a web-based search engine for exploring omics datasets.

PubMed

Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

2017-07-03

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
TOPSAN: a dynamic web database for structural genomics.

PubMed

Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

2011-01-01

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
Olelo: a web application for intuitive exploration of biomedical literature

PubMed Central

Niedermeier, Julian; Jankrift, Marcel; Tietböhl, Sören; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana

2017-01-01

Abstract Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo. PMID:28472397
Web-based Visualization and Query of semantically segmented multiresolution 3D Models in the Field of Cultural Heritage

NASA Astrophysics Data System (ADS)

Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.

2014-05-01

Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).
SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

PubMed

Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

2014-08-15

Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

msBiodat analysis tool, big data analysis for high-throughput experiments.

PubMed

Muñoz-Torres, Pau M; Rokć, Filip; Belužic, Robert; Grbeša, Ivana; Vugrek, Oliver

2016-01-01

Mass spectrometry (MS) are a group of a high-throughput techniques used to increase knowledge about biomolecules. They produce a large amount of data which is presented as a list of hundreds or thousands of proteins. Filtering those data efficiently is the first step for extracting biologically relevant information. The filtering may increase interest by merging previous data with the data obtained from public databases, resulting in an accurate list of proteins which meet the predetermined conditions. In this article we present msBiodat Analysis Tool, a web-based application thought to approach proteomics to the big data analysis. With this tool, researchers can easily select the most relevant information from their MS experiments using an easy-to-use web interface. An interesting feature of msBiodat analysis tool is the possibility of selecting proteins by its annotation on Gene Ontology using its Gene Id, ensembl or UniProt codes. The msBiodat analysis tool is a web-based application that allows researchers with any programming experience to deal with efficient database querying advantages. Its versatility and user-friendly interface makes easy to perform fast and accurate data screening by using complex queries. Once the analysis is finished, the result is delivered by e-mail. msBiodat analysis tool is freely available at http://msbiodata.irb.hr.
Web Services and Other Enhancements at the Northern California Earthquake Data Center

NASA Astrophysics Data System (ADS)

Neuhauser, D. S.; Zuzlewski, S.; Allen, R. M.

2012-12-01

The Northern California Earthquake Data Center (NCEDC) provides data archive and distribution services for seismological and geophysical data sets that encompass northern California. The NCEDC is enhancing its ability to deliver rapid information through Web Services. NCEDC Web Services use well-established web server and client protocols and REST software architecture to allow users to easily make queries using web browsers or simple program interfaces and to receive the requested data in real-time rather than through batch or email-based requests. Data are returned to the user in the appropriate format such as XML, RESP, or MiniSEED depending on the service, and are compatible with the equivalent IRIS DMC web services. The NCEDC is currently providing the following Web Services: (1) Station inventory and channel response information delivered in StationXML format, (2) Channel response information delivered in RESP format, (3) Time series availability delivered in text and XML formats, (4) Single channel and bulk data request delivered in MiniSEED format. The NCEDC is also developing a rich Earthquake Catalog Web Service to allow users to query earthquake catalogs based on selection parameters such as time, location or geographic region, magnitude, depth, azimuthal gap, and rms. It will return (in QuakeML format) user-specified results that can include simple earthquake parameters, as well as observations such as phase arrivals, codas, amplitudes, and computed parameters such as first motion mechanisms, moment tensors, and rupture length. The NCEDC will work with both IRIS and the International Federation of Digital Seismograph Networks (FDSN) to define a uniform set of web service specifications that can be implemented by multiple data centers to provide users with a common data interface across data centers. The NCEDC now hosts earthquake catalogs and waveforms from the US Department of Energy (DOE) Enhanced Geothermal Systems (EGS) monitoring networks. These data can be accessed through the above web services and through special NCEDC web pages.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

PubMed

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

2013-11-01

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care.

PubMed

Brown, Jeffrey S; Holmes, John H; Shah, Kiran; Hall, Ken; Lazarus, Ross; Platt, Richard

2010-06-01

Comparative effectiveness research, medical product safety evaluation, and quality measurement will require the ability to use electronic health data held by multiple organizations. There is no consensus about whether to create regional or national combined (eg, "all payer") databases for these purposes, or distributed data networks that leave most Protected Health Information and proprietary data in the possession of the original data holders. Demonstrate functions of a distributed research network that supports research needs and also address data holders concerns about participation. Key design functions included strong local control of data uses and a centralized web-based querying interface. We implemented a pilot distributed research network and evaluated the design considerations, utility for research, and the acceptability to data holders of methods for menu-driven querying. We developed and tested a central, web-based interface with supporting network software. Specific functions assessed include query formation and distribution, query execution and review, and aggregation of results. This pilot successfully evaluated temporal trends in medication use and diagnoses at 5 separate sites, demonstrating some of the possibilities of using a distributed research network. The pilot demonstrated the potential utility of the design, which addressed the major concerns of both users and data holders. No serious obstacles were identified that would prevent development of a fully functional, scalable network. Distributed networks are capable of addressing nearly all anticipated uses of routinely collected electronic healthcare data. Distributed networks would obviate the need for centralized databases, thus avoiding numerous obstacles.
A SQL-Database Based Meta-CASE System and its Query Subsystem

NASA Astrophysics Data System (ADS)

Eessaar, Erki; Sgirka, Rünno

Meta-CASE systems simplify the creation of CASE (Computer Aided System Engineering) systems. In this paper, we present a meta-CASE system that provides a web-based user interface and uses an object-relational database system (ORDBMS) as its basis. The use of ORDBMSs allows us to integrate different parts of the system and simplify the creation of meta-CASE and CASE systems. ORDBMSs provide powerful query mechanism. The proposed system allows developers to use queries to evaluate and gradually improve artifacts and calculate values of software measures. We illustrate the use of the systems by using SimpleM modeling language and discuss the use of SQL in the context of queries about artifacts. We have created a prototype of the meta-CASE system by using PostgreSQL™ ORDBMS and PHP scripting language.
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan

PubMed Central

Kinjo, Akira R.; Yamashita, Reiko; Nakamura, Haruki

2010-01-01

This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/ PMID:20798081
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.

PubMed

Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki

2010-08-25

This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
CFGP: a web-based, comparative fungal genomics platform

PubMed Central

Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F.; Blair, Jaime E.; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

2008-01-01

Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the ‘fill-in-the-form-and-press-SUBMIT’ user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI. PMID:17947331
UCSC genome browser: deep support for molecular biomedical research.

PubMed

Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

2008-01-01

The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
Discrepancy Reporting Management System

NASA Technical Reports Server (NTRS)

Cooper, Tonja M.; Lin, James C.; Chatillon, Mark L.

2004-01-01

Discrepancy Reporting Management System (DRMS) is a computer program designed for use in the stations of NASA's Deep Space Network (DSN) to help establish the operational history of equipment items; acquire data on the quality of service provided to DSN customers; enable measurement of service performance; provide early insight into the need to improve processes, procedures, and interfaces; and enable the tracing of a data outage to a change in software or hardware. DRMS is a Web-based software system designed to include a distributed database and replication feature to achieve location-specific autonomy while maintaining a consistent high quality of data. DRMS incorporates commercial Web and database software. DRMS collects, processes, replicates, communicates, and manages information on spacecraft data discrepancies, equipment resets, and physical equipment status, and maintains an internal station log. All discrepancy reports (DRs), Master discrepancy reports (MDRs), and Reset data are replicated to a master server at NASA's Jet Propulsion Laboratory; Master DR data are replicated to all the DSN sites; and Station Logs are internal to each of the DSN sites and are not replicated. Data are validated according to several logical mathematical criteria. Queries can be performed on any combination of data.
EURISWEB – Web-based epidemiological surveillance of antibiotic-resistant pneumococci in Day Care Centers

PubMed Central

Silva, Sara; Gouveia-Oliveira, Rodrigo; Maretzek, António; Carriço, João; Gudnason, Thorolfur; Kristinsson, Karl G; Ekdahl, Karl; Brito-Avô, António; Tomasz, Alexander; Sanches, Ilda Santos; Lencastre, Hermínia de; Almeida, Jonas

2003-01-01

Background EURIS (European Resistance Intervention Study) was launched as a multinational study in September of 2000 to identify the multitude of complex risk factors that contribute to the high carriage rate of drug resistant Streptococcus pneumoniae strains in children attending Day Care Centers in several European countries. Access to the very large number of data required the development of a web-based infrastructure – EURISWEB – that includes a relational online database, coupled with a query system for data retrieval, and allows integrative storage of demographic, clinical and molecular biology data generated in EURIS. Methods All components of the system were developed using open source programming tools: data storage management was supported by PostgreSQL, and the hypertext preprocessor to generate the web pages was implemented using PHP. The query system is based on a software agent running in the background specifically developed for EURIS. Results The website currently contains data related to 13,500 nasopharyngeal samples and over one million measures taken from 5,250 individual children, as well as over one thousand pre-made and user-made queries aggregated into several reports, approximately. It is presently in use by participating researchers from three countries (Iceland, Portugal and Sweden). Conclusion An operational model centered on a PHP engine builds the interface between the user and the database automatically, allowing an easy maintenance of the system. The query system is also sufficiently adaptable to allow the integration of several advanced data analysis procedures far more demanding than simple queries, eventually including artificial intelligence predictive models. PMID:12846930
Optimizing the NASA Technical Report Server

NASA Technical Reports Server (NTRS)

Nelson, Michael L.; Maa, Ming-Hokng

1996-01-01

The NASA Technical Report Server (NTRS), a World Wide Web report distribution NASA technical publications service, is modified for performance enhancement, greater protocol support, and human interface optimization. Results include: Parallel database queries, significantly decreasing user access times by an average factor of 2.3; access from clients behind firewalls and/ or proxies which truncate excessively long Uniform Resource Locators (URLs); access to non-Wide Area Information Server (WAIS) databases and compatibility with the 239-50.3 protocol; and a streamlined user interface.
CellLineNavigator: a workbench for cancer cell line analysis

PubMed Central

Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

2013-01-01

The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487
The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations.

PubMed

Lee, HoJoon; Palm, Jennifer; Grimes, Susan M; Ji, Hanlee P

2015-10-27

The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer. The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history. The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.
SeqWare Query Engine: storing and searching sequence data in the cloud.

PubMed

O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

2010-12-21

Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud

PubMed Central

2010-01-01

Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface.

PubMed

Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B; Almon, Richard R; DuBois, Debra C; Jusko, William J; Hoffman, Eric P

2004-01-01

Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).
The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface

PubMed Central

Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B.; Almon, Richard R.; DuBois, Debra C.; Jusko, William J.; Hoffman, Eric P.

2004-01-01

Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp). PMID:14681485
PubMedReco: A Real-Time Recommender System for PubMed Citations.

PubMed

Samuel, Hamman W; Zaïane, Osmar R

2017-01-01

We present a recommender system, PubMedReco, for real-time suggestions of medical articles from PubMed, a database of over 23 million medical citations. PubMedReco can recommend medical article citations while users are conversing in a synchronous communication environment such as a chat room. Normally, users would have to leave their chat interface to open a new web browser window, and formulate an appropriate search query to retrieve relevant results. PubMedReco automatically generates the search query and shows relevant citations within the same integrated user interface. PubMedReco analyzes relevant keywords associated with the conversation and uses them to search for relevant citations using the PubMed E-utilities programming interface. Our contributions include improvements to the user experience for searching PubMed from within health forums and chat rooms, and a machine learning model for identifying relevant keywords. We demonstrate the feasibility of PubMedReco using BMJ's Doc2Doc forum discussions.
FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web.

PubMed

Shapiro, Jessica; Brutlag, Douglas

2004-07-01

The FoldMiner web server (http://foldminer.stanford.edu/) provides remote access to methods for protein structure alignment and unsupervised motif discovery. FoldMiner is unique among such algorithms in that it improves both the motif definition and the sensitivity of a structural similarity search by combining the search and motif discovery methods and using information from each process to enhance the other. In a typical run, a query structure is aligned to all structures in one of several databases of single domain targets in order to identify its structural neighbors and to discover a motif that is the basis for the similarity among the query and statistically significant targets. This process is fully automated, but options for manual refinement of the results are available as well. The server uses the Chime plugin and customized controls to allow for visualization of the motif and of structural superpositions. In addition, we provide an interface to the LOCK 2 algorithm for rapid alignments of a query structure to smaller numbers of user-specified targets.

East-China Geochemistry Database (ECGD):A New Networking Database for North China Craton

NASA Astrophysics Data System (ADS)

Wang, X.; Ma, W.

2010-12-01

North China Craton is one of the best natural laboratories that research some Earth Dynamic questions[1]. Scientists made much progress in research on this area, and got vast geochemistry data, which are essential for answering many fundamental questions about the age, composition, structure, and evolution of the East China area. But the geochemical data have long been accessible only through the scientific literature and theses where they have been widely dispersed, making it difficult for the broad Geosciences community to find, access and efficiently use the full range of available data[2]. How to effectively store, manage, share and reuse the existing geochemical data in the North China Craton area? East-China Geochemistry Database(ECGD) is a networking geochemical scientific database system that has been designed based on WebGIS and relational database for the structured storage and retrieval of geochemical data and geological map information. It is integrated the functions of data retrieval, spatial visualization and online analysis. ECGD focus on three areas: 1.Storage and retrieval of geochemical data and geological map information. Research on the characters of geochemical data, including its composing and connecting of each other, we designed a relational database, which based on geochemical relational data model, to store a variety of geological sample information such as sampling locality, age, sample characteristics, reference, major elements, rare earth elements, trace elements and isotope system et al. And a web-based user-friendly interface is provided for constructing queries. 2.Data view. ECGD is committed to online data visualization by different ways, especially to view data in digital map with dynamic way. Because ECGD was integrated WebGIS technology, the query results can be mapped on digital map, which can be zoomed, translation and dot selection. Besides of view and output query results data by html, txt or xls formats, researchers also can generate classification thematic maps using query results, according different parameters. 3.Data analysis on-line. Here we designed lots of geochemical online analysis tools, including geochemical diagrams, CIPW computing, and so on, which allows researchers to analyze query data without download query results. Operation of all these analysis tools is very easy; users just do it by click mouse one or two time. In summary, ECGD provide a geochemical platform for researchers, whom to know where various data are, to view various data in a synthetic and dynamic way, and analyze interested data online. REFERENCES [1] S. Gao, R.L. Rudnick, and W.L. Xu, “Recycling deep cratonic lithosphere and generation of intraplate magmatism in the North China Craton,” Earth and Planetary Science Letters,270,41-53,2008. [2] K.A. Lehnert, U. Harms, and E. Ito, “Promises, Achievements, and Challenges of Networking Global Geoinformatics Resources - Experiences of GeosciNET and EarthChem,” Geophysical Research Abstracts, Vol.10, EGU2008-A-05242,2008.
Web-based access to near real-time and archived high-density time-series data: cyber infrastructure challenges & developments in the open-source Waveform Server

NASA Astrophysics Data System (ADS)

Reyes, J. C.; Vernon, F. L.; Newman, R. L.; Steidl, J. H.

2010-12-01

The Waveform Server is an interactive web-based interface to multi-station, multi-sensor and multi-channel high-density time-series data stored in Center for Seismic Studies (CSS) 3.0 schema relational databases (Newman et al., 2009). In the last twelve months, based on expanded specifications and current user feedback, both the server-side infrastructure and client-side interface have been extensively rewritten. The Python Twisted server-side code-base has been fundamentally modified to now present waveform data stored in cluster-based databases using a multi-threaded architecture, in addition to supporting the pre-existing single database model. This allows interactive web-based access to high-density (broadband @ 40Hz to strong motion @ 200Hz) waveform data that can span multiple years; the common lifetime of broadband seismic networks. The client-side interface expands on it's use of simple JSON-based AJAX queries to now incorporate a variety of User Interface (UI) improvements including standardized calendars for defining time ranges, applying on-the-fly data calibration to display SI-unit data, and increased rendering speed. This presentation will outline the various cyber infrastructure challenges we have faced while developing this application, the use-cases currently in existence, and the limitations of web-based application development.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

PubMed Central

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

2016-01-01

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

PubMed Central

Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

2012-01-01

With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
Design and Development of a Linked Open Data-Based Health Information Representation and Visualization System: Potentials and Preliminary Evaluation

PubMed Central

Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur

2014-01-01

Background Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)—a new Semantic Web set of best practice of standards to publish and link heterogeneous data—can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. Objective The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. Methods We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk—a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. Results We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. Conclusions The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development. PMID:25601195
Design and development of a linked open data-based health information representation and visualization system: potentials and preliminary evaluation.

PubMed

Tilahun, Binyam; Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur

2014-10-25

Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)-a new Semantic Web set of best practice of standards to publish and link heterogeneous data-can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk-a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development.
Saada: A Generator of Astronomical Database

NASA Astrophysics Data System (ADS)

Michel, L.

2011-11-01

Saada transforms a set of heterogeneous FITS files or VOtables of various categories (images, tables, spectra, etc.) in a powerful database deployed on the Web. Databases are located on your host and stay independent of any external server. This job doesn’t require writing code. Saada can mix data of various categories in multiple collections. Data collections can be linked each to others making relevant browsing paths and allowing data-mining oriented queries. Saada supports 4 VO services (Spectra, images, sources and TAP) . Data collections can be published immediately after the deployment of the Web interface.
Mobile medical visual information retrieval.

PubMed

Depeursinge, Adrien; Duc, Samuel; Eggel, Ivan; Müller, Henning

2012-01-01

In this paper, we propose mobile access to peer-reviewed medical information based on textual search and content-based visual image retrieval. Web-based interfaces designed for limited screen space were developed to query via web services a medical information retrieval engine optimizing the amount of data to be transferred in wireless form. Visual and textual retrieval engines with state-of-the-art performance were integrated. Results obtained show a good usability of the software. Future use in clinical environments has the potential of increasing quality of patient care through bedside access to the medical literature in context.
Research on Agriculture Domain Meta-Search Engine System

NASA Astrophysics Data System (ADS)

Xie, Nengfu; Wang, Wensheng

The rapid growth of agriculture web information brings a fact that search engine can not return a satisfied result for users’ queries. In this paper, we propose an agriculture domain search engine system, called ADSE, that can obtains results by an advance interface to several searches and aggregates them. We also discuss two key technologies: agriculture information determination and engine.
EarthServer - an FP7 project to enable the web delivery and analysis of 3D/4D models

NASA Astrophysics Data System (ADS)

Laxton, John; Sen, Marcus; Passmore, James

2013-04-01

EarthServer aims at open access and ad-hoc analytics on big Earth Science data, based on the OGC geoservice standards Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS). The WCS model defines "coverages" as a unifying paradigm for multi-dimensional raster data, point clouds, meshes, etc., thereby addressing a wide range of Earth Science data including 3D/4D models. WCPS allows declarative SQL-style queries on coverages. The project is developing a pilot implementing these standards, and will also investigate the use of GeoSciML to describe coverages. Integration of WCPS with XQuery will in turn allow coverages to be queried in combination with their metadata and GeoSciML description. The unified service will support navigation, extraction, aggregation, and ad-hoc analysis on coverage data from SQL. Clients will range from mobile devices to high-end immersive virtual reality, and will enable 3D model visualisation using web browser technology coupled with developing web standards. EarthServer is establishing open-source client and server technology intended to be scalable to Petabyte/Exabyte volumes, based on distributed processing, supercomputing, and cloud virtualization. Implementation will be based on the existing rasdaman server technology developed. Services using rasdaman technology are being installed serving the atmospheric, oceanographic, geological, cryospheric, planetary and general earth observation communities. The geology service (http://earthserver.bgs.ac.uk/) is being provided by BGS and at present includes satellite imagery, superficial thickness data, onshore DTMs and 3D models for the Glasgow area. It is intended to extend the data sets available to include 3D voxel models. Use of the WCPS standard allows queries to be constructed against single or multiple coverages. For example on a single coverage data for a particular area can be selected or data with a particular range of pixel values. Queries on multiple surfaces can be constructed to calculate, for example, the thickness between two surfaces in a 3D model or the depth from ground surface to the top of a particular geologic unit. In the first version of the service a simple interface showing some example queries has been implemented in order to show the potential of the technologies. The project aims to develop the services available in light of user feedback, both in terms of the data available, the functionality and the interface. User feedback on the services guides the software and standards development aspects of the project, leading to enhanced versions of the software which will be implemented in upgraded versions of the services during the lifetime of the project.
CrossQuery: a web tool for easy associative querying of transcriptome data.

PubMed

Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

2011-01-01

Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.
BioModels.net Web Services, a free and integrated toolkit for computational modelling software.

PubMed

Li, Chen; Courtot, Mélanie; Le Novère, Nicolas; Laibe, Camille

2010-05-01

Exchanging and sharing scientific results are essential for researchers in the field of computational modelling. BioModels.net defines agreed-upon standards for model curation. A fundamental one, MIRIAM (Minimum Information Requested in the Annotation of Models), standardises the annotation and curation process of quantitative models in biology. To support this standard, MIRIAM Resources maintains a set of standard data types for annotating models, and provides services for manipulating these annotations. Furthermore, BioModels.net creates controlled vocabularies, such as SBO (Systems Biology Ontology) which strictly indexes, defines and links terms used in Systems Biology. Finally, BioModels Database provides a free, centralised, publicly accessible database for storing, searching and retrieving curated and annotated computational models. Each resource provides a web interface to submit, search, retrieve and display its data. In addition, the BioModels.net team provides a set of Web Services which allows the community to programmatically access the resources. A user is then able to perform remote queries, such as retrieving a model and resolving all its MIRIAM Annotations, as well as getting the details about the associated SBO terms. These web services use established standards. Communications rely on SOAP (Simple Object Access Protocol) messages and the available queries are described in a WSDL (Web Services Description Language) file. Several libraries are provided in order to simplify the development of client software. BioModels.net Web Services make one step further for the researchers to simulate and understand the entirety of a biological system, by allowing them to retrieve biological models in their own tool, combine queries in workflows and efficiently analyse models.
41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zamora, Antonio

Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less
The Binding Database: data management and interface design.

PubMed

Chen, Xi; Lin, Yuhmei; Liu, Ming; Gilson, Michael K

2002-01-01

The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.
WebCIS: large scale deployment of a Web-based clinical information system.

PubMed

Hripcsak, G; Cimino, J J; Sengupta, S

1999-01-01

WebCIS is a Web-based clinical information system. It sits atop the existing Columbia University clinical information system architecture, which includes a clinical repository, the Medical Entities Dictionary, an HL7 interface engine, and an Arden Syntax based clinical event monitor. WebCIS security features include authentication with secure tokens, authorization maintained in an LDAP server, SSL encryption, permanent audit logs, and application time outs. WebCIS is currently used by 810 physicians at the Columbia-Presbyterian center of New York Presbyterian Healthcare to review and enter data into the electronic medical record. Current deployment challenges include maintaining adequate database performance despite complex queries, replacing large numbers of computers that cannot run modern Web browsers, and training users that have never logged onto the Web. Although the raised expectations and higher goals have increased deployment costs, the end result is a far more functional, far more available system.
Sexual information seeking on web search engines.

PubMed

Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles

2004-02-01

Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
AlzPharm: integration of neurodegeneration data using RDF.

PubMed

Lam, Hugo Y K; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

2007-05-09

Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields.
AlzPharm: integration of neurodegeneration data using RDF

PubMed Central

Lam, Hugo YK; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

2007-01-01

Background Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. Results We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Conclusion Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields. PMID:17493287
A novel adaptive Cuckoo search for optimal query plan generation.

PubMed

Gomathi, Ramalingam; Sharmila, Dhandapani

2014-01-01

The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Accessing near real-time Antarctic meteorological data through an OGC Sensor Observation Service (SOS)

NASA Astrophysics Data System (ADS)

Kirsch, Peter; Breen, Paul

2013-04-01

We wish to highlight outputs of a project conceived from a science requirement to improve discovery and access to Antarctic meteorological data in near real-time. Given that the data was distributed in both spatial and temporal domains and is to be accessed across several science disciplines, the creation of an interoperable, OGC compliant web service was deemed the most appropriate approach. We will demonstrate an implementation of the OGC SOS Interface Standard to discover, browse, and access Antarctic meteorological data-sets. A selection of programmatic (R, Perl) and web client interfaces utilizing open technologies ( e.g. jQuery, Flot, openLayers ) will be demonstrated. In addition we will show how high level abstractions can be constructed to allow the users flexible and straightforward access to SOS retrieved data.

WE-E-BRB-11: Riview a Web-Based Viewer for Radiotherapy.

PubMed

Apte, A; Wang, Y; Deasy, J

2012-06-01

Collaborations involving radiotherapy data collection, such as the recently proposed international radiogenomics consortium, require robust, web-based tools to facilitate reviewing treatment planning information. We present the architecture and prototype characteristics for a web-based radiotherapy viewer. The web-based environment developed in this work consists of the following components: 1) Import of DICOM/RTOG data: CERR was leveraged to import DICOM/RTOG data and to convert to database friendly RT objects. 2) Extraction and Storage of RT objects: The scan and dose distributions were stored as .png files per slice and view plane. The file locations were written to the MySQL database. Structure contours and DVH curves were written to the database as numeric data. 3) Web interfaces to query, retrieve and visualize the RT objects: The Web application was developed using HTML 5 and Ruby on Rails (RoR) technology following the MVC philosophy. The open source ImageMagick library was utilized to overlay scan, dose and structures. The application allows users to (i) QA the treatment plans associated with a study, (ii) Query and Retrieve patients matching anonymized ID and study, (iii) Review up to 4 plans simultaneously in 4 window panes (iv) Plot DVH curves for the selected structures and dose distributions. A subset of data for lung cancer patients was used to prototype the system. Five user accounts were created to have access to this study. The scans, doses, structures and DVHs for 10 patients were made available via the web application. A web-based system to facilitate QA, and support Query, Retrieve and the Visualization of RT data was prototyped. The RIVIEW system was developed using open source and free technology like MySQL and RoR. We plan to extend the RIVIEW system further to be useful in clinical trial data collection, outcomes research, cohort plan review and evaluation. © 2012 American Association of Physicists in Medicine.
MetaSEEk: a content-based metasearch engine for images

NASA Astrophysics Data System (ADS)

Beigi, Mandis; Benitez, Ana B.; Chang, Shih-Fu

1997-12-01

Search engines are the most powerful resources for finding information on the rapidly expanding World Wide Web (WWW). Finding the desired search engines and learning how to use them, however, can be very time consuming. The integration of such search tools enables the users to access information across the world in a transparent and efficient manner. These systems are called meta-search engines. The recent emergence of visual information retrieval (VIR) search engines on the web is leading to the same efficiency problem. This paper describes and evaluates MetaSEEk, a content-based meta-search engine used for finding images on the Web based on their visual information. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. We compare MetaSEEk with a base line version of meta-search engine, which does not use the past performance of the different search engines in recommending target search engines for future queries.
PIQMIe: a web server for semi-quantitative proteomics data management and analysis

PubMed Central

Kuzniar, Arnold; Kanaar, Roland

2014-01-01

We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. PMID:24861615
PIQMIe: a web server for semi-quantitative proteomics data management and analysis.

PubMed

Kuzniar, Arnold; Kanaar, Roland

2014-07-01

We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Exposing the cancer genome atlas as a SPARQL endpoint

PubMed Central

Deus, Helena F.; Veiga, Diogo F.; Freire, Pablo R.; Weinstein, John N.; Mills, Gordon B.; Almeida, Jonas S.

2011-01-01

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. PMID:20851208
Mining Longitudinal Web Queries: Trends and Patterns.

ERIC Educational Resources Information Center

Wang, Peiling; Berry, Michael W.; Yang, Yiheng

2003-01-01

Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…
The EarthServer Federation: State, Role, and Contribution to GEOSS

NASA Astrophysics Data System (ADS)

Merticariu, Vlad; Baumann, Peter

2016-04-01

The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS. Datacube fusion in a "mix and match" style is supported by the platform technolgy, the rasdaman Array Database System, which transparently federates queries so that users simply approach any node of the federation to access any data item, internally optimized for minimal data transfer. Notably, rasdaman is part of GEOSS GCI. NASA is contributing its Web WorldWind virtual globe for user-friendly data extraction, navigation, and analysis. Integrated datacube / metadata queries are contributed by CITE. Current federation members include ESA (managed by MEEO sr.l.), Plymouth Marine Laboratory (PML), the European Centre for Medium-Range Weather Forecast (ECMWF), Australia's National Computational Infrastructure, and Jacobs University (adding in Planetary Science). Further data centers have expressed interest in joining. We present the EarthServer approach, discuss its underlying technology, and illustrate the contribution this datacube platform can make to GEOSS.
Application based on ArcObject inquiry and Google maps demonstration to real estate database

NASA Astrophysics Data System (ADS)

Hwang, JinTsong

2007-06-01

Real estate industry in Taiwan has been flourishing in recent years. To acquire various and abundant information of real estate for sale is the same goal for the consumers and the brokerages. Therefore, before looking at the property, it is important to get all pertinent information possible. Not only this beneficial for the real estate agent as they can provide the sellers with the most information, thereby solidifying the interest of the buyer, but may also save time and the cost of manpower were something out of place. Most of the brokerage sites are aware of utilizes Internet as form of media for publicity however; the contents are limited to specific property itself and the functions of query are mostly just provided searching by condition. This paper proposes a query interface on website which gives function of zone query by spatial analysis for non-GIS users, developing a user-friendly interface with ArcObject in VB6, and query by condition. The inquiry results can show on the web page which is embedded functions of Google Maps and the UrMap API on it. In addition, the demonstration of inquiry results will give the multimedia present way which includes hyperlink to Google Earth with surrounding of the property, the Virtual Reality scene of house, panorama of interior of building and so on. Therefore, the website provides extra spatial solution for query and demonstration abundant information of real estate in two-dimensional and three-dimensional types of view.
PhyreStorm: A Web Server for Fast Structural Searches Against the PDB.

PubMed

Mezulis, Stefans; Sternberg, Michael J E; Kelley, Lawrence A

2016-02-22

The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93±2% of all highly similar (TM-score>0.7) structures in the PDB for each query structure, usually in less than 60s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
A WebGL Tool for Visualizing the Topology of the Sun's Coronal Magnetic Field

NASA Astrophysics Data System (ADS)

Duffy, A.; Cheung, C.; DeRosa, M. L.

2012-12-01

We present a web-based, topology-viewing tool that allows users to visualize the geometry and topology of the Sun's 3D coronal magnetic field in an interactive manner. The tool is implemented using, open-source, mature, modern web technologies including WebGL, jQuery, HTML 5, and CSS 3, which are compatible with nearly all modern web browsers. As opposed to the traditional method of visualization, which involves the downloading and setup of various software packages-proprietary and otherwise-the tool presents a clean interface that allows the user to easily load and manipulate the model, while also offering great power to choose which topological features are displayed. The tool accepts data encoded in the JSON open format that has libraries available for nearly every major programming language, making it simple to generate the data.
Global Agricultural Monitoring (GLAM) using MODAPS and LANCE Data Products

NASA Astrophysics Data System (ADS)

Anyamba, A.; Pak, E. E.; Majedi, A. H.; Small, J. L.; Tucker, C. J.; Reynolds, C. A.; Pinzon, J. E.; Smith, M. M.

2012-12-01

The Global Inventory Modeling and Mapping Studies / Global Agricultural Monitoring (GIMMS GLAM) system is a web-based geographic application that offers Moderate Resolution Imaging Spectroradiometer (MODIS) imagery and user interface tools to data query and plot MODIS NDVI time series. The system processes near real-time and science quality Terra and Aqua MODIS 8-day composited datasets. These datasets are derived from the MOD09 and MYD09 surface reflectance products which are generated and provided by NASA/GSFC Land and Atmosphere Near Real-time Capability for EOS (LANCE) and NASA/GSFC MODIS Adaptive Processing System (MODAPS). The GIMMS GLAM system is developed and provided by the NASA/GSFC GIMMS group for the U.S. Department of Agriculture / Foreign Agricultural Service / International Production Assessment Division (USDA/FAS/IPAD) Global Agricultural Monitoring project (GLAM). The USDA/FAS/IPAD mission is to provide objective, timely, and regular assessment of the global agricultural production outlook and conditions affecting global food security. This system was developed to improve USDA/FAS/IPAD capabilities for making operational quantitative estimates for crop production and yield estimates based on satellite-derived data. The GIMMS GLAM system offers 1) web map imagery including Terra & Aqua MODIS 8-day composited NDVI, NDVI percent anomaly, and SWIR-NIR-Red band combinations, 2) web map overlays including administrative and 0.25 degree Land Information System (LIS) shape boundaries, and crop land cover masks, and 3) user interface tools to select features, data query, plot, and download MODIS NDVI time series.
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

PubMed

Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

2012-07-01

With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
A semantically-aided architecture for a web-based monitoring system for carotid atherosclerosis.

PubMed

Kolias, Vassileios D; Stamou, Giorgos; Golemati, Spyretta; Stoitsis, Giannis; Gkekas, Christos D; Liapis, Christos D; Nikita, Konstantina S

2015-08-01

Carotid atherosclerosis is a multifactorial disease and its clinical diagnosis depends on the evaluation of heterogeneous clinical data, such as imaging exams, biochemical tests and the patient's clinical history. The lack of interoperability between Health Information Systems (HIS) does not allow the physicians to acquire all the necessary data for the diagnostic process. In this paper, a semantically-aided architecture is proposed for a web-based monitoring system for carotid atherosclerosis that is able to gather and unify heterogeneous data with the use of an ontology and to create a common interface for data access enhancing the interoperability of HIS. The architecture is based on an application ontology of carotid atherosclerosis that is used to (a) integrate heterogeneous data sources on the basis of semantic representation and ontological reasoning and (b) access the critical information using SPARQL query rewriting and ontology-based data access services. The architecture was tested over a carotid atherosclerosis dataset consisting of the imaging exams and the clinical profile of 233 patients, using a set of complex queries, constructed by the physicians. The proposed architecture was evaluated with respect to the complexity of the queries that the physicians could make and the retrieval speed. The proposed architecture gave promising results in terms of interoperability, data integration of heterogeneous sources with an ontological way and expanded capabilities of query and retrieval in HIS.
Search Interface Design Using Faceted Indexing for Web Resources.

ERIC Educational Resources Information Center

Devadason, Francis; Intaraksa, Neelawat; Patamawongjariya, Pornprapa; Desai, Kavita

2001-01-01

Describes an experimental system designed to organize and provide access to Web documents using a faceted pre-coordinate indexing system based on the Deep Structure Indexing System (DSIS) derived from POPSI (Postulate based Permuted Subject Indexing) of Bhattacharyya, and the facet analysis and chain indexing system of Ranganathan. (AEF)
Automating Information Discovery Within the Invisible Web

NASA Astrophysics Data System (ADS)

Sweeney, Edwina; Curran, Kevin; Xie, Ermai

A Web crawler or spider crawls through the Web looking for pages to index, and when it locates a new page it passes the page on to an indexer. The indexer identifies links, keywords, and other content and stores these within its database. This database is searched by entering keywords through an interface and suitable Web pages are returned in a results page in the form of hyperlinks accompanied by short descriptions. The Web, however, is increasingly moving away from being a collection of documents to a multidimensional repository for sounds, images, audio, and other formats. This is leading to a situation where certain parts of the Web are invisible or hidden. The term known as the "Deep Web" has emerged to refer to the mass of information that can be accessed via the Web but cannot be indexed by conventional search engines. The concept of the Deep Web makes searches quite complex for search engines. Google states that the claim that conventional search engines cannot find such documents as PDFs, Word, PowerPoint, Excel, or any non-HTML page is not fully accurate and steps have been taken to address this problem by implementing procedures to search items such as academic publications, news, blogs, videos, books, and real-time information. However, Google still only provides access to a fraction of the Deep Web. This chapter explores the Deep Web and the current tools available in accessing it.
Cafe Variome: general-purpose software for making genotype-phenotype data discoverable in restricted or open access contexts.

PubMed

Lancaster, Owen; Beck, Tim; Atlan, David; Swertz, Morris; Thangavelu, Dhiwagaran; Veal, Colin; Dalgleish, Raymond; Brookes, Anthony J

2015-10-01

Biomedical data sharing is desirable, but problematic. Data "discovery" approaches-which establish the existence rather than the substance of data-precisely connect data owners with data seekers, and thereby promote data sharing. Cafe Variome (http://www.cafevariome.org) was therefore designed to provide a general-purpose, Web-based, data discovery tool that can be quickly installed by any genotype-phenotype data owner, or network of data owners, to make safe or sensitive content appropriately discoverable. Data fields or content of any type can be accommodated, from simple ID and label fields through to extensive genotype and phenotype details based on ontologies. The system provides a "shop window" in front of data, with main interfaces being a simple search box and a powerful "query-builder" that enable very elaborate queries to be formulated. After a successful search, counts of records are reported grouped by "openAccess" (data may be directly accessed), "linkedAccess" (a source link is provided), and "restrictedAccess" (facilitated data requests and subsequent provision of approved records). An administrator interface provides a wide range of options for system configuration, enabling highly customized single-site or federated networks to be established. Current uses include rare disease data discovery, patient matchmaking, and a Beacon Web service. © 2015 WILEY PERIODICALS, INC.
Open Clients for Distributed Databases

NASA Astrophysics Data System (ADS)

Chayes, D. N.; Arko, R. A.

2001-12-01

We are actively developing a collection of open source example clients that demonstrate use of our "back end" data management infrastructure. The data management system is reported elsewhere at this meeting (Arko and Chayes: A Scaleable Database Infrastructure). In addition to their primary goal of being examples for others to build upon, some of these clients may have limited utility in them selves. More information about the clients and the data infrastructure is available on line at http://data.ldeo.columbia.edu. The available examples to be demonstrated include several web-based clients including those developed for the Community Review System of the Digital Library for Earth System Education, a real-time watch standers log book, an offline interface to use log book entries, a simple client to search on multibeam metadata and others are Internet enabled and generally web-based front ends that support searches against one or more relational databases using industry standard SQL queries. In addition to the web based clients, simple SQL searches from within Excel and similar applications will be demonstrated. By defining, documenting and publishing a clear interface to the fully searchable databases, it becomes relatively easy to construct client interfaces that are optimized for specific applications in comparison to building a monolithic data and user interface system.
Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases.

PubMed

Kobayashi, Norio; Ishii, Manabu; Takahashi, Satoshi; Mochizuki, Yoshiki; Matsushima, Akihiro; Toyoda, Tetsuro

2011-07-01

Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org.
A SOA broker solution for standard discovery and access services: the GI-cat framework

NASA Astrophysics Data System (ADS)

Boldrini, Enrico

2010-05-01

GI-cat ideal users are data providers or service providers within the geoscience community. The former have their data already available through an access service (e.g. an OGC Web Service) and would have it published through a standard catalog service, in a seamless way. The latter would develop a catalog broker and let users query and access different geospatial resources through one or more standard interfaces and Application Profiles (AP) (e.g. OGC CSW ISO AP, CSW ebRIM/EO AP, etc.). GI-cat actually implements a broker components (i.e. a middleware service) which carries out distribution and mediation functionalities among "well-adopted" catalog interfaces and data access protocols. GI-cat also publishes different discovery interfaces: the OGC CSW ISO and ebRIM Application Profiles (the latter coming with support for the EO and CIM extension packages) and two different OpenSearch interfaces developed in order to explore Web 2.0 possibilities. An extended interface is also available to exploit all available GI-cat features, such as interruptible incremental queries and queries feedback. Interoperability tests performed in the context of different projects have also pointed out the importance to enforce compatibility with existing and wide-spread tools of the open source community (e.g. GeoNetwork and Deegree catalogs), which was then achieved. Based on a service-oriented framework of modular components, GI-cat can effectively be customized and tailored to support different deployment scenarios. In addition to the distribution functionality an harvesting approach has been lately experimented, allowing the user to switch between a distributed and a local search giving thus more possibilities to support different deployment scenarios. A configurator tool is available in order to enable an effective high level configuration of the broker service. A specific geobrowser was also naturally developed, for demonstrating the advanced GI-cat functionalities. This client, called GI-go, is an example of the possible applications which may be built on top of the GI-cat broker component. GI-go allows discovering and browsing of the available datasets, retrieving and evaluating their description and performing distributed queries according to any combination of the following criteria: geographic area, temporal interval, topic of interest (free-text and/or keyword selection are allowed) and data source (i.e. where, when, what, who). The results set of a query (e.g. datasets metadata) are then displayed in an incremental way leveraging the asynchronous interactions approach implemented by GI-cat. This feature allows the user to access the intermediate query results. Query interruption and feedback features are also provided to the user. Alternatively, user may perform a browsing task by selecting a catalog resource from the current configuration and navigate through its aggregated and/or leaf datasets. In both cases datasets metadata, expressed according to ISO 19139 (and also Dublin Core and ebRIM if available), are displayed for download, along with a resource portrayal and actual data access (when this is meaningful and possible). The GI-cat distributed catalog service has been successfully deployed and experimented in the framework of different projects and initiative, including the SeaDataNet FP6 project, GEOSS IP3 (Interoperability Process Pilot Project), GEOSS AIP-2 (Architectural Implementation Project - Phase 2), FP7 GENESI-DR, CNR GIIDA, FP7 EUROGEOSS and ESA HMA project.
Open Data, Jupyter Notebooks and Geospatial Data Standards Combined - Opening up large volumes of marine and climate data to other communities

NASA Astrophysics Data System (ADS)

Clements, O.; Siemen, S.; Wagemann, J.

2017-12-01

The EU-funded Earthserver-2 project aims to offer on-demand access to large volumes of environmental data (Earth Observation, Marine, Climate data and Planetary data) via the interface standard Web Coverage Service defined by the Open Geospatial Consortium. Providing access to data via OGC web services (e.g. WCS and WMS) has the potential to open up services to a wider audience, especially to users outside the respective communities. Especially WCS 2.0 with its processing extension Web Coverage Processing Service (WCPS) is highly beneficial to make large volumes accessible to non-expert communities. Users do not have to deal with custom community data formats, such as GRIB for the meteorological community, but can directly access the data in a format they are more familiar with, such as NetCDF, JSON or CSV. Data requests can further directly be integrated into custom processing routines and users are not required to download Gigabytes of data anymore. WCS supports trim (reduction of data extent) and slice (reduction of data dimension) operations on multi-dimensional data, providing users a very flexible on-demand access to the data. WCPS allows the user to craft queries to run on the data using a text-based query language, similar to SQL. These queries can be very powerful, e.g. condensing a three-dimensional data cube into its two-dimensional mean. However, the more processing-intensive the more complex the query. As part of the EarthServer-2 project, we developed a python library that helps users to generate complex WCPS queries with Python, a programming language they are more familiar with. The interactive presentation aims to give practical examples how users can benefit from two specific WCS services from the Marine and Climate community. Use-cases from the two communities will show different approaches to take advantage of a Web Coverage (Processing) Service. The entire content is available with Jupyter Notebooks, as they prove to be a highly beneficial tool to generate reproducible workflows for environmental data analysis.

Integrating Radar Image Data with Google Maps

NASA Technical Reports Server (NTRS)

Chapman, Bruce D.; Gibas, Sarah

2010-01-01

A public Web site has been developed as a method for displaying the multitude of radar imagery collected by NASA s Airborne Synthetic Aperture Radar (AIRSAR) instrument during its 16-year mission. Utilizing NASA s internal AIRSAR site, the new Web site features more sophisticated visualization tools that enable the general public to have access to these images. The site was originally maintained at NASA on six computers: one that held the Oracle database, two that took care of the software for the interactive map, and three that were for the Web site itself. Several tasks were involved in moving this complicated setup to just one computer. First, the AIRSAR database was migrated from Oracle to MySQL. Then the back-end of the AIRSAR Web site was updated in order to access the MySQL database. To do this, a few of the scripts needed to be modified; specifically three Perl scripts that query that database. The database connections were then updated from Oracle to MySQL, numerous syntax errors were corrected, and a query was implemented that replaced one of the stored Oracle procedures. Lastly, the interactive map was designed, implemented, and tested so that users could easily browse and access the radar imagery through the Google Maps interface.
The BioExtract Server: a web-based bioinformatic workflow platform

PubMed Central

Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

2011-01-01

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Insight: An ontology-based integrated database and analysis platform for epilepsy self-management research.

PubMed

Sahoo, Satya S; Ramesh, Priya; Welter, Elisabeth; Bukach, Ashley; Valdez, Joshua; Tatsuoka, Curtis; Bamps, Yvan; Stoll, Shelley; Jobst, Barbara C; Sajatovic, Martha

2016-10-01

We present Insight as an integrated database and analysis platform for epilepsy self-management research as part of the national Managing Epilepsy Well Network. Insight is the only available informatics platform for accessing and analyzing integrated data from multiple epilepsy self-management research studies with several new data management features and user-friendly functionalities. The features of Insight include, (1) use of Common Data Elements defined by members of the research community and an epilepsy domain ontology for data integration and querying, (2) visualization tools to support real time exploration of data distribution across research studies, and (3) an interactive visual query interface for provenance-enabled research cohort identification. The Insight platform contains data from five completed epilepsy self-management research studies covering various categories of data, including depression, quality of life, seizure frequency, and socioeconomic information. The data represents over 400 participants with 7552 data points. The Insight data exploration and cohort identification query interface has been developed using Ruby on Rails Web technology and open source Web Ontology Language Application Programming Interface to support ontology-based reasoning. We have developed an efficient ontology management module that automatically updates the ontology mappings each time a new version of the Epilepsy and Seizure Ontology is released. The Insight platform features a Role-based Access Control module to authenticate and effectively manage user access to different research studies. User access to Insight is managed by the Managing Epilepsy Well Network database steering committee consisting of representatives of all current collaborating centers of the Managing Epilepsy Well Network. New research studies are being continuously added to the Insight database and the size as well as the unique coverage of the dataset allows investigators to conduct aggregate data analysis that will inform the next generation of epilepsy self-management studies. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Exposing the cancer genome atlas as a SPARQL endpoint.

PubMed

Deus, Helena F; Veiga, Diogo F; Freire, Pablo R; Weinstein, John N; Mills, Gordon B; Almeida, Jonas S

2010-12-01

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. Copyright © 2010 Elsevier Inc. All rights reserved.
START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

PubMed

Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y

2017-09-22

A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.
First Prototype of a Web Map Interface for ESA's Planetary Science Archive (PSA)

NASA Astrophysics Data System (ADS)

Manaud, N.; Gonzalez, J.

2014-04-01

We present a first prototype of a Web Map Interface that will serve as a proof of concept and design for ESA's future fully web-based Planetary Science Archive (PSA) User Interface. The PSA is ESA's planetary science archiving authority and central repository for all scientific and engineering data returned by ESA's Solar System missions [1]. All data are compliant with NASA's Planetary Data System (PDS) Standards and are accessible through several interfaces [2]: in addition to serving all public data via FTP and the Planetary Data Access Protocol (PDAP), a Java-based User Interface provides advanced search, preview, download, notification and delivery-basket functionality. It allows the user to query and visualise instrument observations footprints using a map-based interface (currently only available for Mars Express HRSC and OMEGA instruments). During the last decade, the planetary mapping science community has increasingly been adopting Geographic Information System (GIS) tools and standards, originally developed for and used in Earth science. There is an ongoing effort to produce and share cartographic products through Open Geospatial Consortium (OGC) Web Services, or as standalone data sets, so that they can be readily used in existing GIS applications [3,4,5]. Previous studies conducted at ESAC [6,7] have helped identify the needs of Planetary GIS users, and define key areas of improvement for the future Web PSA User Interface. Its web map interface shall will provide access to the full geospatial content of the PSA, including (1) observation geometry footprints of all remote sensing instruments, and (2) all georeferenced cartographic products, such as HRSC map-projected data or OMEGA global maps from Mars Express. It shall aim to provide a rich user experience for search and visualisation of this content using modern and interactive web mapping technology. A comprehensive set of built-in context maps from external sources, such as MOLA topography, TES infrared maps or planetary surface nomenclature, provided in both simple cylindrical and polar stereographic projections, shall enhance this user experience. In addition, users should be able to import and export data in commonly used open- GIS formats. It is also intended to serve all PSA geospatial data through OGC-compliant Web Services so that they can be captured, visualised and analysed directly from GIS software, along with data from other sources. The following figure illustrates how the PSA web map interface and services shall fit in a typical Planetary GIS user working environment.
A metadata-aware application for remote scoring and exchange of tissue microarray images

PubMed Central

2013-01-01

Background The use of tissue microarrays (TMA) and advances in digital scanning microscopy has enabled the collection of thousands of tissue images. There is a need for software tools to annotate, query and share this data amongst researchers in different physical locations. Results We have developed an open source web-based application for remote scoring of TMA images, which exploits the value of Microsoft Silverlight Deep Zoom to provide a intuitive interface for zooming and panning around digital images. We use and extend existing XML-based standards to ensure that the data collected can be archived and that our system is interoperable with other standards-compliant systems. Conclusion The application has been used for multi-centre scoring of TMA slides composed of tissues from several Phase III breast cancer trials and ten different studies participating in the International Breast Cancer Association Consortium (BCAC). The system has enabled researchers to simultaneously score large collections of TMA and export the standardised data to integrate with pathological and clinical outcome data, thereby facilitating biomarker discovery. PMID:23635078
Searching the Web: The Public and Their Queries.

ERIC Educational Resources Information Center

Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko

2001-01-01

Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…
An Analysis of Web Image Queries for Search.

ERIC Educational Resources Information Center

Pu, Hsiao-Tieh

2003-01-01

Examines the differences between Web image and textual queries, and attempts to develop an analytic model to investigate their implications for Web image retrieval systems. Provides results that give insight into Web image searching behavior and suggests implications for improvement of current Web image search engines. (AEF)
PDB-Metrics: a web tool for exploring the PDB contents.

PubMed

Fileto, Renato; Kuser, Paula R; Yamagishi, Michel E B; Ribeiro, André A; Quinalia, Thiago G; Franco, Eduardo H; Mancini, Adauto L; Higa, Roberto H; Oliveira, Stanley R M; Santos, Edgard H; Vieira, Fabio D; Mazoni, Ivan; Cruz, Sergio A B; Neshich, Goran

2006-06-30

PDB-Metrics (http://sms.cbi.cnptia.embrapa.br/SMS/pdb_metrics/index.html) is a component of the Diamond STING suite of programs for the analysis of protein sequence, structure and function. It summarizes the characteristics of the collection of protein structure descriptions deposited in the Protein Data Bank (PDB) and provides a Web interface to search and browse the PDB, using a variety of alternative criteria. PDB-Metrics is a powerful tool for bioinformaticians to examine the data span in the PDB from several perspectives. Although other Web sites offer some similar resources to explore the PDB contents, PDB-Metrics is among those with the most complete set of such facilities, integrated into a single Web site. This program has been developed using SQLite, a C library that provides all the query facilities of a database management system.
UCbase 2.0: ultraconserved sequences database (2014 update)

PubMed Central

Lomonaco, Vincenzo; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, Laura; Emmett, Warren; Bicciato, Silvio; Taccioli, Cristian

2014-01-01

UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it PMID:24951797
SkyDOT: a publicly accessible variability database, containing multiple sky surveys and real-time data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Starr, D. L.; Wozniak, P. R.; Vestrand, W. T.

2002-01-01

SkyDOT (Sky Database for Objects in Time-Domain) is a Virtual Observatory currently comprised of data from the RAPTOR, ROTSE I, and OGLE I1 survey projects. This makes it a very large time domain database. In addition, the RAPTOR project provides SkyDOT with real-time variability data as well as stereoscopic information. With its web interface, we believe SkyDOT will be a very useful tool for both astronomers, and the public. Our main task has been to construct an efficient relational database containing all existing data, while handling a real-time inflow of data. We also provide a useful web interface allowing easymore » access to both astronomers and the public. Initially, this server will allow common searches, specific queries, and access to light curves. In the future we will include machine learning classification tools and access to spectral information.« less
Dynamic User Interfaces for Service Oriented Architectures in Healthcare.

PubMed

Schweitzer, Marco; Hoerbst, Alexander

2016-01-01

Electronic Health Records (EHRs) play a crucial role in healthcare today. Considering a data-centric view, EHRs are very advanced as they provide and share healthcare data in a cross-institutional and patient-centered way adhering to high syntactic and semantic interoperability. However, the EHR functionalities available for the end users are rare and hence often limited to basic document query functions. Future EHR use necessitates the ability to let the users define their needed data according to a certain situation and how this data should be processed. Workflow and semantic modelling approaches as well as Web services provide means to fulfil such a goal. This thesis develops concepts for dynamic interfaces between EHR end users and a service oriented eHealth infrastructure, which allow the users to design their flexible EHR needs, modeled in a dynamic and formal way. These are used to discover, compose and execute the right Semantic Web services.
Querying Semi-Structured Data

NASA Technical Reports Server (NTRS)

Abiteboul, Serge

1997-01-01

The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.
Motivated Proteins: A web application for studying small three-dimensional protein motifs

PubMed Central

Leader, David P; Milner-White, E James

2009-01-01

Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785
RIMS: An Integrated Mapping and Analysis System with Applications to Earth Sciences and Hydrology

NASA Astrophysics Data System (ADS)

Proussevitch, A. A.; Glidden, S.; Shiklomanov, A. I.; Lammers, R. B.

2011-12-01

A web-based information and computational system for analysis of spatially distributed Earth system, climate, and hydrologic data have been developed. The System allows visualization, data exploration, querying, manipulation and arbitrary calculations with any loaded gridded or vector polygon dataset. The system's acronym, RIMS, stands for its core functionality as a Rapid Integrated Mapping System. The system can be deployed for a Global scale projects as well as for regional hydrology and climatology studies. In particular, the Water Systems Analysis Group of the University of New Hampshire developed the global and regional (Northern Eurasia, pan-Arctic) versions of the system with different map projections and specific data. The system has demonstrated its potential for applications in other fields of Earth sciences and education. The key Web server/client components of the framework include (a) a visualization engine built on Open Source libraries (GDAL, PROJ.4, etc.) that are utilized in a MapServer; (b) multi-level data querying tools built on XML server-client communication protocols that allow downloading map data on-the-fly to a client web browser; and (c) data manipulation and grid cell level calculation tools that mimic desktop GIS software functionality via a web interface. Server side data management of the system is designed around a simple database of dataset metadata facilitating mounting of new data to the system and maintaining existing data in an easy manner. RIMS contains "built-in" river network data that allows for query of upstream areas on-demand which can be used for spatial data aggregation and analysis of sub-basin areas. RIMS is an ongoing effort and currently being used to serve a number of websites hosting a suite of hydrologic, environmental and other GIS data.
LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures.

PubMed

Duan, Qiaonan; Flynn, Corey; Niepel, Mario; Hafner, Marc; Muhlich, Jeremy L; Fernandez, Nicolas F; Rouillard, Andrew D; Tan, Christopher M; Chen, Edward Y; Golub, Todd R; Sorger, Peter K; Subramanian, Aravind; Ma'ayan, Avi

2014-07-01

For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Indexing the medical open access literature for textual and content-based visual retrieval.

PubMed

Eggel, Ivan; Müller, Henning

2010-01-01

Over the past few years an increasing amount of scientific journals have been created in an open access format. Particularly in the medical field the number of openly accessible journals is enormous making a wide body of knowledge available for analysis and retrieval. Part of the trend towards open access publications can be linked to funding bodies such as the NIH¹ (National Institutes of Health) and the Swiss National Science Foundation (SNF²) requiring funded projects to make all articles of funded research available publicly. This article describes an approach to make part of the knowledge of open access journals available for retrieval including the textual information but also the images contained in the articles. For this goal all articles of 24 journals related to medical informatics and medical imaging were crawled from the web pages of BioMed Central. Text and images of the PDF (Portable Document Format) files were indexed separately and a web-based retrieval interface allows for searching via keyword queries or by visual similarity queries. Starting point for a visual similarity query can be an image on the local hard disk that is uploaded or any image found via the textual search. Search for similar documents is also possible.
TDR Targets: a chemogenomics resource for neglected diseases.

PubMed

Magariños, María P; Carmona, Santiago J; Crowther, Gregory J; Ralph, Stuart A; Roos, David S; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C; Agüero, Fernán

2012-01-01

The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context.
TDR Targets: a chemogenomics resource for neglected diseases

PubMed Central

Magariños, María P.; Carmona, Santiago J.; Crowther, Gregory J.; Ralph, Stuart A.; Roos, David S.; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C.; Agüero, Fernán

2012-01-01

The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context. PMID:22116064

Web queries as a source for syndromic surveillance.

PubMed

Hulth, Anette; Rydevik, Gustaf; Linde, Annika

2009-01-01

In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.
GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

PubMed Central

Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

2009-01-01

Background DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865
GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.

PubMed

Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

2009-09-03

DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.
Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases

PubMed Central

Kobayashi, Norio; Ishii, Manabu; Takahashi, Satoshi; Mochizuki, Yoshiki; Matsushima, Akihiro; Toyoda, Tetsuro

2011-01-01

Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org. PMID:21632604
Multimedia Web Searching Trends.

ERIC Educational Resources Information Center

Ozmutlu, Seda; Spink, Amanda; Ozmutlu, H. Cenk

2002-01-01

Examines and compares multimedia Web searching by Excite and FAST search engine users in 2001. Highlights include audio and video queries; time spent on searches; terms per query; ranking of the most frequently used terms; and differences in Web search behaviors of U.S. and European Web users. (Author/LRW)
HippDB: a database of readily targeted helical protein-protein interactions.

PubMed

Bergey, Christina M; Watkins, Andrew M; Arora, Paramjit S

2013-11-01

HippDB catalogs every protein-protein interaction whose structure is available in the Protein Data Bank and which exhibits one or more helices at the interface. The Web site accepts queries on variables such as helix length and sequence, and it provides computational alanine scanning and change in solvent-accessible surface area values for every interfacial residue. HippDB is intended to serve as a starting point for structure-based small molecule and peptidomimetic drug development. HippDB is freely available on the web at http://www.nyu.edu/projects/arora/hippdb. The Web site is implemented in PHP, MySQL and Apache. Source code freely available for download at http://code.google.com/p/helidb, implemented in Perl and supported on Linux. arora@nyu.edu.
Integration of gel-based proteome data with pProRep.

PubMed

Laukens, Kris; Matthiesen, Rune; Lemière, Filip; Esmans, Eddy; Onckelen, Harry Van; Jensen, Ole Nørregaard; Witters, Erwin

2006-11-15

pProRep is a web application integrating electrophoretic and mass spectral data from proteome analyses into a relational database. The graphical web-interface allows users to upload, analyse and share experimental proteome data. It offers researchers the possibility to query all previously analysed datasets and can visualize selected features, such as the presence of a certain set of ions in a peptide mass spectrum, on the level of the two-dimensional gel. The pProRep package and instructions for its use can be downloaded from http://www.ptools.ua.ac.be/pProRep. The application requires a web server that runs PHP 5 (http://www.php.net) and MySQL. Some (non-essential) extensions need additional freely available libraries: details are described in the installation instructions.
ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins.

PubMed

Konc, Janez; Janezic, Dusanka

2012-07-01

The ProBiS web server is a web server for detection of structurally similar binding sites in the PDB and for local pairwise alignment of protein structures. In this article, we present a new version of the ProBiS web server that is 10 times faster than earlier versions, due to the efficient parallelization of the ProBiS algorithm, which now allows significantly faster comparison of a protein query against the PDB and reduces the calculation time for scanning the entire PDB from hours to minutes. It also features new web services, and an improved user interface. In addition, the new web server is united with the ProBiS-Database and thus provides instant access to pre-calculated protein similarity profiles for over 29 000 non-redundant protein structures. The ProBiS web server is particularly adept at detection of secondary binding sites in proteins. It is freely available at http://probis.cmm.ki.si/old-version, and the new ProBiS web server is at http://probis.cmm.ki.si.
ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins

PubMed Central

Konc, Janez; Janežič, Dušanka

2012-01-01

The ProBiS web server is a web server for detection of structurally similar binding sites in the PDB and for local pairwise alignment of protein structures. In this article, we present a new version of the ProBiS web server that is 10 times faster than earlier versions, due to the efficient parallelization of the ProBiS algorithm, which now allows significantly faster comparison of a protein query against the PDB and reduces the calculation time for scanning the entire PDB from hours to minutes. It also features new web services, and an improved user interface. In addition, the new web server is united with the ProBiS-Database and thus provides instant access to pre-calculated protein similarity profiles for over 29 000 non-redundant protein structures. The ProBiS web server is particularly adept at detection of secondary binding sites in proteins. It is freely available at http://probis.cmm.ki.si/old-version, and the new ProBiS web server is at http://probis.cmm.ki.si. PMID:22600737
Context-Aware Online Commercial Intention Detection

NASA Astrophysics Data System (ADS)

Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng

With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.
Web-based metabolic network visualization with a zooming user interface

PubMed Central

2011-01-01

Background Displaying complex metabolic-map diagrams, for Web browsers, and allowing users to interact with them for querying and overlaying expression data over them is challenging. Description We present a Web-based metabolic-map diagram, which can be interactively explored by the user, called the Cellular Overview. The main characteristic of this application is the zooming user interface enabling the user to focus on appropriate granularities of the network at will. Various searching commands are available to visually highlight sets of reactions, pathways, enzymes, metabolites, and so on. Expression data from single or multiple experiments can be overlaid on the diagram, which we call the Omics Viewer capability. The application provides Web services to highlight the diagram and to invoke the Omics Viewer. This application is entirely written in JavaScript for the client browsers and connect to a Pathway Tools Web server to retrieve data and diagrams. It uses the OpenLayers library to display tiled diagrams. Conclusions This new online tool is capable of displaying large and complex metabolic-map diagrams in a very interactive manner. This application is available as part of the Pathway Tools software that powers multiple metabolic databases including Biocyc.org: The Cellular Overview is accessible under the Tools menu. PMID:21595965
Web-based flood database for Colorado, water years 1867 through 2011

USGS Publications Warehouse

Kohn, Michael S.; Jarrett, Robert D.; Krammes, Gary S.; Mommandi, Amanullah

2013-01-01

In order to provide a centralized repository of flood information for the State of Colorado, the U.S. Geological Survey, in cooperation with the Colorado Department of Transportation, created a Web-based geodatabase for flood information from water years 1867 through 2011 and data for paleofloods occurring in the past 5,000 to 10,000 years. The geodatabase was created using the Environmental Systems Research Institute ArcGIS JavaScript Application Programing Interface 3.2. The database can be accessed at http://cwscpublic2.cr.usgs.gov/projects/coflood/COFloodMap.html. Data on 6,767 flood events at 1,597 individual sites throughout Colorado were compiled to generate the flood database. The data sources of flood information are indirect discharge measurements that were stored in U.S. Geological Survey offices (water years 1867–2011), flood data from indirect discharge measurements referenced in U.S. Geological Survey reports (water years 1884–2011), paleoflood studies from six peer-reviewed journal articles (data on events occurring in the past 5,000 to 10,000 years), and the U.S. Geological Survey National Water Information System peak-discharge database (water years 1883–2010). A number of tests were performed on the flood database to ensure the quality of the data. The Web interface was programmed using the Environmental Systems Research Institute ArcGIS JavaScript Application Programing Interface 3.2, which allows for display, query, georeference, and export of the data in the flood database. The data fields in the flood database used to search and filter the database include hydrologic unit code, U.S. Geological Survey station number, site name, county, drainage area, elevation, data source, date of flood, peak discharge, and field method used to determine discharge. Additional data fields can be viewed and exported, but the data fields described above are the only ones that can be used for queries.
PMD2HD--a web tool aligning a PubMed search results page with the local German Cancer Research Centre library collection.

PubMed

Bohne-Lang, Andreas; Lang, Elke; Taube, Anke

2005-06-27

Web-based searching is the accepted contemporary mode of retrieving relevant literature, and retrieving as many full text articles as possible is a typical prerequisite for research success. In most cases only a proportion of references will be directly accessible as digital reprints through displayed links. A large number of references, however, have to be verified in library catalogues and, depending on their availability, are accessible as print holdings or by interlibrary loan request. The problem of verifying local print holdings from an initial retrieval set of citations can be solved using Z39.50, an ANSI protocol for interactively querying library information systems. Numerous systems include Z39.50 interfaces and therefore can process Z39.50 interactive requests. However, the programmed query interaction command structure is non-intuitive and inaccessible to the average biomedical researcher. For the typical user, it is necessary to implement the protocol within a tool that hides and handles Z39.50 syntax, presenting a comfortable user interface. PMD2HD is a web tool implementing Z39.50 to provide an appropriately functional and usable interface to integrate into the typical workflow that follows an initial PubMed literature search, providing users with an immediate asset to assist in the most tedious step in literature retrieval, checking for subscription holdings against a local online catalogue. PMD2HD can facilitate literature access considerably with respect to the time and cost of manual comparisons of search results with local catalogue holdings. The example presented in this article is related to the library system and collections of the German Cancer Research Centre. However, the PMD2HD software architecture and use of common Z39.50 protocol commands allow for transfer to a broad range of scientific libraries using Z39.50-compatible library information systems.
Columba: an integrated database of proteins, structures, and annotations.

PubMed

Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

2005-03-31

Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

PubMed

Jamil, Hasan M

2017-01-01

Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
DGIdb 3.0: a redesign and expansion of the drug-gene interaction database.

PubMed

Cotto, Kelsy C; Wagner, Alex H; Feng, Yang-Yang; Kiwala, Susanna; Coffman, Adam C; Spies, Gregory; Wollam, Alex; Spies, Nicholas C; Griffith, Obi L; Griffith, Malachi

2018-01-04

The drug-gene interaction database (DGIdb, www.dgidb.org) consolidates, organizes and presents drug-gene interactions and gene druggability information from papers, databases and web resources. DGIdb normalizes content from 30 disparate sources and allows for user-friendly advanced browsing, searching and filtering for ease of access through an intuitive web user interface, application programming interface (API) and public cloud-based server image. DGIdb v3.0 represents a major update of the database. Nine of the previously included 24 sources were updated. Six new resources were added, bringing the total number of sources to 30. These updates and additions of sources have cumulatively resulted in 56 309 interaction claims. This has also substantially expanded the comprehensive catalogue of druggable genes and anti-neoplastic drug-gene interactions included in the DGIdb. Along with these content updates, v3.0 has received a major overhaul of its codebase, including an updated user interface, preset interaction search filters, consolidation of interaction information into interaction groups, greatly improved search response times and upgrading the underlying web application framework. In addition, the expanded API features new endpoints which allow users to extract more detailed information about queried drugs, genes and drug-gene interactions, including listings of PubMed IDs, interaction type and other interaction metadata.
Automatic home medical product recommendation.

PubMed

Luo, Gang; Thomas, Selena B; Tang, Chunqiang

2012-04-01

Web-based personal health records (PHRs) are being widely deployed. To improve PHR's capability and usability, we proposed the concept of intelligent PHR (iPHR). In this paper, we use automatic home medical product recommendation as a concrete application to demonstrate the benefits of introducing intelligence into PHRs. In this new application domain, we develop several techniques to address the emerging challenges. Our approach uses treatment knowledge and nursing knowledge, and extends the language modeling method to (1) construct a topic-selection input interface for recommending home medical products, (2) produce a global ranking of Web pages retrieved by multiple queries, and (3) provide diverse search results. We demonstrate the effectiveness of our techniques using USMLE medical exam cases.
A "Simple Query Interface" Adapter for the Discovery and Exchange of Learning Resources

ERIC Educational Resources Information Center

Massart, David

2006-01-01

Developed as part of CEN/ISSS Workshop on Learning Technology efforts to improve interoperability between learning resource repositories, the Simple Query Interface (SQI) is an Application Program Interface (API) for querying heterogeneous repositories of learning resource metadata. In the context of the ProLearn Network of Excellence, SQI is used…
Oasis 2: improved online analysis of small RNA-seq data.

PubMed

Rahman, Raza-Ur; Gautam, Abhivyakti; Bethune, Jörn; Sattar, Abdul; Fiosins, Maksims; Magruder, Daniel Sumner; Capece, Vincenzo; Shomroni, Orr; Bonn, Stefan

2018-02-14

Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing. Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Compared to its predecessor Oasis, Oasis 2 features a novel and speed-optimized sRNA detection module that supports the identification of small RNAs in any organism with higher accuracy. Next to the improved detection of small RNAs in a target organism, the software now also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples. In addition, novel miRNAs can now be queried and visualized interactively, providing essential information for over 700 high-quality miRNA predictions across 14 organisms. Robust biomarker signatures can now be obtained using the novel enhanced classification module. Oasis 2 enables biologists and medical researchers to rapidly analyze and query small RNA deep sequencing data with improved precision, recall, and speed, in an interactive and user-friendly environment. Oasis 2 is implemented in Java, J2EE, mysql, Python, R, PHP and JavaScript. It is freely available at https://oasis.dzne.de.
ExplorEnz: the primary source of the IUBMB enzyme list

PubMed Central

McDonald, Andrew G.; Boyce, Sinéad; Tipton, Keith F.

2009-01-01

ExplorEnz is the MySQL database that is used for the curation and dissemination of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature. A simple web-based query interface is provided, along with an advanced search engine for more complex Boolean queries. The WWW front-end is accessible at http://www.enzyme-database.org, from where downloads of the database as SQL and XML are also available. An associated form-based curatorial application has been developed to facilitate the curation of enzyme data as well as the internal and public review processes that occur before an enzyme entry is made official. Suggestions for new enzyme entries, or modifications to existing ones, can be made using the forms provided at http://www.enzyme-database.org/forms.php. PMID:18776214

PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

PubMed

García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

2010-11-01

PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder
Data-driven decision support for radiologists: re-using the National Lung Screening Trial dataset for pulmonary nodule management.

PubMed

Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L

2015-02-01

Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

PubMed

Khennak, Ilyes; Drias, Habiba

2017-02-01

With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
EarthServer: Visualisation and use of uncertainty as a data exploration tool

NASA Astrophysics Data System (ADS)

Walker, Peter; Clements, Oliver; Grant, Mike

2013-04-01

The Ocean Science/Earth Observation community generates huge datasets from satellite observation. Until recently it has been difficult to obtain matching uncertainty information for these datasets and to apply this to their processing. In order to make use of uncertainty information when analysing "Big Data" we need both the uncertainty itself (attached to the underlying data) and a means of working with the combined product without requiring the entire dataset to be downloaded. The European Commission FP7 project EarthServer (http://earthserver.eu) is addressing the problem of accessing and ad-hoc analysis of extreme-size Earth Science data using cutting-edge Array Database technology. The core software (Rasdaman) and web services wrapper (Petascope) allow huge datasets to be accessed using Open Geospatial Consortium (OGC) standard interfaces including the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on any of the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. The ESA Ocean Colour - Climate Change Initiative (OC-CCI) project (http://www.esa-oceancolour-cci.org/), is producing high-resolution, global ocean colour datasets over the full time period (1998-2012) where high quality observations were available. This climate data record includes per-pixel uncertainty data for each variable, based on an analytic method that classifies how much and which types of water are present in a pixel, and assigns uncertainty based on robust comparisons to global in-situ validation datasets. These uncertainty values take two forms, Root Mean Square (RMS) and Bias uncertainty, respectively representing the expected variability and expected offset error. By combining the data produced through the OC-CCI project with the software from the EarthServer project we can produce a novel data offering that allows the use of traditional exploration and access mechanisms such as WMS and WCS. However the real benefits can be seen when utilising WCPS to explore the data . We will show two major benefits to this infrastructure. Firstly we will show that the visualisation of the combined chlorophyll and uncertainty datasets through a web based GIS portal gives users the ability to instantaneously assess the quality of the data they are exploring using traditional web based plotting techniques as well as through novel web based 3 dimensional visualisation. Secondly we will showcase the benefits available when combining these data with the WCPS standard. The uncertainty data can be utilised in queries using the standard WCPS query language. This allows selection of data either for download or use within the query, based on the respective uncertainty values as well as the possibility of incorporating both the chlorophyll data and uncertainty data into complex queries to produce additional novel data products. By filtering with uncertainty at the data source rather than the client we can minimise traffic over the network allowing huge datasets to be worked on with a minimal time penalty.
AQBE — QBE Style Queries for Archetyped Data

NASA Astrophysics Data System (ADS)

Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.
Finding and Exploring Health Information with a Slider-Based User Interface.

PubMed

Pang, Patrick Cheong-Iao; Verspoor, Karin; Pearce, Jon; Chang, Shanton

2016-01-01

Despite the fact that search engines are the primary channel to access online health information, there are better ways to find and explore health information on the web. Search engines are prone to problems when they are used to find health information. For instance, users have difficulties in expressing health scenarios with appropriate search keywords, search results are not optimised for medical queries, and the search process does not account for users' literacy levels and reading preferences. In this paper, we describe our approach to addressing these problems by introducing a novel design using a slider-based user interface for discovering health information without the need for precise search keywords. The user evaluation suggests that the interface is easy to use and able to assist users in the process of discovering new information. This study demonstrates the potential value of adopting slider controls in the user interface of health websites for navigation and information discovery.
BioSearch: a semantic search engine for Bio2RDF

PubMed Central

Qiu, Honglei; Huang, Jiacheng

2017-01-01

Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451
IsoCleft Finder – a web-based tool for the detection and analysis of protein binding-site geometric and chemical similarities

PubMed Central

Najmanovich, Rafael

2013-01-01

IsoCleft Finder is a web-based tool for the detection of local geometric and chemical similarities between potential small-molecule binding cavities and a non-redundant dataset of ligand-bound known small-molecule binding-sites. The non-redundant dataset developed as part of this study is composed of 7339 entries representing unique Pfam/PDB-ligand (hetero group code) combinations with known levels of cognate ligand similarity. The query cavity can be uploaded by the user or detected automatically by the system using existing PDB entries as well as user-provided structures in PDB format. In all cases, the user can refine the definition of the cavity interactively via a browser-based Jmol 3D molecular visualization interface. Furthermore, users can restrict the search to a subset of the dataset using a cognate-similarity threshold. Local structural similarities are detected using the IsoCleft software and ranked according to two criteria (number of atoms in common and Tanimoto score of local structural similarity) and the associated Z-score and p-value measures of statistical significance. The results, including predicted ligands, target proteins, similarity scores, number of atoms in common, etc., are shown in a powerful interactive graphical interface. This interface permits the visualization of target ligands superimposed on the query cavity and additionally provides a table of pairwise ligand topological similarities. Similarities between top scoring ligands serve as an additional tool to judge the quality of the results obtained. We present several examples where IsoCleft Finder provides useful functional information. IsoCleft Finder results are complementary to existing approaches for the prediction of protein function from structure, rational drug design and x-ray crystallography. IsoCleft Finder can be found at: http://bcb.med.usherbrooke.ca/isocleftfinder. PMID:24555058
Monitoring operational data production applying Big Data tooling

NASA Astrophysics Data System (ADS)

Som de Cerff, Wim; de Jong, Hotze; van den Berg, Roy; Bos, Jeroen; Oosterhoff, Rijk; Klein Ikkink, Henk Jan; Haga, Femke; Elsten, Tom; Verhoef, Hans; Koutek, Michal; van de Vegte, John

2015-04-01

Within the KNMI Deltaplan programme for improving the KNMI operational infrastructure an new fully automated system for monitoring the KNMI operational data production systems is being developed: PRISMA (PRocessflow Infrastructure Surveillance and Monitoring Application). Currently the KNMI operational (24/7) production systems consist of over 60 applications, running on different hardware systems and platforms. They are interlinked for the production of numerous data products, which are delivered to internal and external customers. All applications are individually monitored by different applications, complicating root cause and impact analysis. Also, the underlying hardware and network is monitored separately using Zabbix. Goal of the new system is to enable production chain monitoring, which enables root cause analysis (what is the root cause of the disruption) and impact analysis (what other products will be effected). The PRISMA system will make it possible to dispose all the existing monitoring applications, providing one interface for monitoring the data production. For modeling the production chain, the Neo4j Graph database is used to store and query the model. The model can be edited through the PRISMA web interface, but is mainly automatically provided by the applications and systems which are to be monitored. The graph enables us to do root case and impact analysis. The graph can be visualized in the PRISMA web interface on different levels. Each 'monitored object' in the model will have a status (OK, error, warning, unknown). This status is derived by combing all log information available. For collecting and querying the log information Splunk is used. The system is developed using Scrum, by a multi-disciplinary team consisting of analysts, developers, a tester and interaction designer. In the presentation we will focus on the lessons learned working with the 'Big data' tooling Splunk and Neo4J.
An SSVEP-Based Brain-Computer Interface for Text Spelling With Adaptive Queries That Maximize Information Gain Rates.

PubMed

Akce, Abdullah; Norton, James J S; Bretl, Timothy

2015-09-01

This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.
The DEDUCE Guided Query Tool: Providing Simplified Access to Clinical Data for Research and Quality Improvement

PubMed Central

Horvath, Monica M.; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

2011-01-01

In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction—the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a guided query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. PMID:21130181
The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement.

PubMed

Horvath, Monica M; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

2011-04-01

In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction--the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a Guided Query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. Copyright © 2010 Elsevier Inc. All rights reserved.
Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

PubMed

Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

2011-05-01

A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.
A Web Application For Visualizing Empirical Models of the Space-Atmosphere Interface Region: AtModWeb

NASA Astrophysics Data System (ADS)

Knipp, D.; Kilcommons, L. M.; Damas, M. C.

2015-12-01

We have created a simple and user-friendly web application to visualize output from empirical atmospheric models that describe the lower atmosphere and the Space-Atmosphere Interface Region (SAIR). The Atmospheric Model Web Explorer (AtModWeb) is a lightweight, multi-user, Python-driven application which uses standard web technology (jQuery, HTML5, CSS3) to give an in-browser interface that can produce plots of modeled quantities such as temperature and individual species and total densities of neutral and ionized upper-atmosphere. Output may be displayed as: 1) a contour plot over a map projection, 2) a pseudo-color plot (heatmap) which allows visualization of a variable as a function of two spatial coordinates, or 3) a simple line plot of one spatial coordinate versus any number of desired model output variables. The application is designed around an abstraction of an empirical atmospheric model, essentially treating the model code as a black box, which makes it simple to add additional models without modifying the main body of the application. Currently implemented are the Naval Research Laboratory NRLMSISE00 model for neutral atmosphere and the International Reference Ionosphere (IRI). These models are relevant to the Low Earth Orbit environment and the SAIR. The interface is simple and usable, allowing users (students and experts) to specify time and location, and choose between historical (i.e. the values for the given date) or manual specification of whichever solar or geomagnetic activity drivers are required by the model. We present a number of use-case examples from research and education: 1) How does atmospheric density between the surface and 1000 km vary with time of day, season and solar cycle?; 2) How do ionospheric layers change with the solar cycle?; 3 How does the composition of the SAIR vary between day and night at a fixed altitude?
Unifying Access to National Hydrologic Data Repositories via Web Services

NASA Astrophysics Data System (ADS)

Valentine, D. W.; Jennings, B.; Zaslavsky, I.; Maidment, D. R.

2006-12-01

The CUAHSI hydrologic information system (HIS) is designed to be a live, multiscale web portal system for accessing, querying, visualizing, and publishing distributed hydrologic observation data and models for any location or region in the United States. The HIS design follows the principles of open service oriented architecture, i.e. system components are represented as web services with well defined standard service APIs. WaterOneFlow web services are the main component of the design. The currently available services have been completely re-written compared to the previous version, and provide programmatic access to USGS NWIS. (steam flow, groundwater and water quality repositories), DAYMET daily observations, NASA MODIS, and Unidata NAM streams, with several additional web service wrappers being added (EPA STORET, NCDC and others.). Different repositories of hydrologic data use different vocabularies, and support different types of query access. Resolving semantic and structural heterogeneities across different hydrologic observation archives and distilling a generic set of service signatures is one of the main scalability challenges in this project, and a requirement in our web service design. To accomplish the uniformity of the web services API, data repositories are modeled following the CUAHSI Observation Data Model. The web service responses are document-based, and use an XML schema to express the semantics in a standard format. Access to station metadata is provided via web service methods, GetSites, GetSiteInfo and GetVariableInfo. The methdods form the foundation of CUAHSI HIS discovery interface and may execute over locally-stored metadata or request the information from remote repositories directly. Observation values are retrieved via a generic GetValues method which is executed against national data repositories. The service is implemented in ASP.Net, and other providers are implementing WaterOneFlow services in java. Reference implementation of WaterOneFlow web services is available. More information about the ongoing development of CUAHSI HIS is available from http://www.cuahsi.org/his/.
Advances on Sensor Web for Internet of Things

NASA Astrophysics Data System (ADS)

Liang, S.; Bermudez, L. E.; Huang, C.; Jazayeri, M.; Khalafbeigi, T.

2013-12-01

'In much the same way that HTML and HTTP enabled WWW, the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE), envisioned in 2001 [1] will allow sensor webs to become a reality.'. Due to the large number of sensor manufacturers and differing accompanying protocols, integrating diverse sensors into observation systems is not a simple task. A coherent infrastructure is needed to treat sensors in an interoperable, platform-independent and uniform way. SWE standardizes web service interfaces, sensor descriptions and data encodings as building blocks for a Sensor Web. SWE standards are now mature specifications (version 2.0) with approved OGC compliance test suites and tens of independent implementations. Many earth and space science organizations and government agencies are using the SWE standards to publish and share their sensors and observations. While SWE has been demonstrated very effective for scientific sensors, its complexity and the computational overhead may not be suitable for resource-constrained tiny sensors. In June 2012, a new OGC Standards Working Group (SWG) was formed called the Sensor Web Interface for Internet of Things (SWE-IoT) SWG. This SWG focuses on developing one or more OGC standards for resource-constrained sensors and actuators (e.g., Internet of Things devices) while leveraging the existing OGC SWE standards. In the near future, billions to trillions of small sensors and actuators will be embedded in real- world objects and connected to the Internet facilitating a concept called the Internet of Things (IoT). By populating our environment with real-world sensor-based devices, the IoT is opening the door to exciting possibilities for a variety of application domains, such as environmental monitoring, transportation and logistics, urban informatics, smart cities, as well as personal and social applications. The current SWE-IoT development aims on modeling the IoT components and defining a standard web service that makes the observations captured by IoT devices easily accessible and allows users to task the actuators on the IoT devices. The SWE IoT model links things with sensors and reuses the OGC Observation and Model (O&M) to link sensors with features of interest and observed properties Unlike most SWE standards, the SWE-IoT defines a RESTful web interface for users to perform CRUD (i.e., create, read, update, and delete) functions on resources, including Things, Sensors, Actuators, Observations, Tasks, etc. Inspired by the OASIS Open Data Protocol (OData), the SWE-IoT web service provides the multi-faceted query, which means that users can query from different entity collections and link from one entity to other related entities. This presentation will introduce the latest development of the OGC SWE-IoT standards. Potential applications and implications in Earth and Space science will also be discussed. [1] Mike Botts, Sensor Web Enablement White Paper, Open GIS Consortium, Inc. 2002
The Ruby UCSC API: accessing the UCSC genome database using Ruby.

PubMed

Mishima, Hiroyuki; Aerts, Jan; Katayama, Toshiaki; Bonnal, Raoul J P; Yoshiura, Koh-ichiro

2012-09-21

The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index-if available-when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
The Ruby UCSC API: accessing the UCSC genome database using Ruby

PubMed Central

2012-01-01

Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/. PMID:22994508
Query Classification and Study of University Students' Search Trends

ERIC Educational Resources Information Center

Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

2012-01-01

Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…
UCbase 2.0: ultraconserved sequences database (2014 update).

PubMed

Lomonaco, Vincenzo; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, Laura; Emmett, Warren; Bicciato, Silvio; Taccioli, Cristian

2014-01-01

UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it. © The Author(s) 2014. Published by Oxford University Press.

EcoCyc: a comprehensive database resource for Escherichia coli

PubMed Central

Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.

2005-01-01

The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210
Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data

PubMed Central

2013-01-01

Background The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. Results In this paper, we present our approach to the generation of self-describing machine-readable scholarly documents. We understand the scientific document as an entry point and interface to the Web of Data. We have semantically processed the full-text, open-access subset of PubMed Central. Our RDF model and resulting dataset make extensive use of existing ontologies and semantic enrichment services. We expose our model, services, prototype, and datasets at http://biotea.idiginfo.org/ Conclusions The semantic processing of biomedical literature presented in this paper embeds documents within the Web of Data and facilitates the execution of concept-based queries against the entire digital library. Our approach delivers a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. Our model delivers a semantically rich and highly interconnected dataset with self-describing content so that software can make effective use of it. PMID:23734622
Introducing the PRIDE Archive RESTful web services.

PubMed

Reisinger, Florian; del-Toro, Noemi; Ternent, Tobias; Hermjakob, Henning; Vizcaíno, Juan Antonio

2015-07-01

The PRIDE (PRoteomics IDEntifications) database is one of the world-leading public repositories of mass spectrometry (MS)-based proteomics data and it is a founding member of the ProteomeXchange Consortium of proteomics resources. In the original PRIDE database system, users could access data programmatically by accessing the web services provided by the PRIDE BioMart interface. New REST (REpresentational State Transfer) web services have been developed to serve the most popular functionality provided by BioMart (now discontinued due to data scalability issues) and address the data access requirements of the newly developed PRIDE Archive. Using the API (Application Programming Interface) it is now possible to programmatically query for and retrieve peptide and protein identifications, project and assay metadata and the originally submitted files. Searching and filtering is also possible by metadata information, such as sample details (e.g. species and tissues), instrumentation (mass spectrometer), keywords and other provided annotations. The PRIDE Archive web services were first made available in April 2014. The API has already been adopted by a few applications and standalone tools such as PeptideShaker, PRIDE Inspector, the Unipept web application and the Python-based BioServices package. This application is free and open to all users with no login requirement and can be accessed at http://www.ebi.ac.uk/pride/ws/archive/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

PubMed

Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

2017-08-10

Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.
FINDbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide.

PubMed

van Baal, Sjozef; Kaimakis, Polynikis; Phommarinh, Manyphong; Koumbi, Daphne; Cuppens, Harry; Riccardino, Francesca; Macek, Milan; Scriver, Charles R; Patrinos, George P

2007-01-01

Frequency of INherited Disorders database (FINDbase) (http://www.findbase.org) is a relational database, derived from the ETHNOS software, recording frequencies of causative mutations leading to inherited disorders worldwide. Database records include the population and ethnic group, the disorder name and the related gene, accompanied by links to any corresponding locus-specific mutation database, to the respective Online Mendelian Inheritance in Man entries and the mutation together with its frequency in that population. The initial information is derived from the published literature, locus-specific databases and genetic disease consortia. FINDbase offers a user-friendly query interface, providing instant access to the list and frequencies of the different mutations. Query outputs can be either in a table or graphical format, accompanied by reference(s) on the data source. Registered users from three different groups, namely administrator, national coordinator and curator, are responsible for database curation and/or data entry/correction online via a password-protected interface. Databaseaccess is free of charge and there are no registration requirements for data querying. FINDbase provides a simple, web-based system for population-based mutation data collection and retrieval and can serve not only as a valuable online tool for molecular genetic testing of inherited disorders but also as a non-profit model for sustainable database funding, in the form of a 'database-journal'.
CoNVaQ: a web tool for copy number variation-based association studies.

PubMed

Larsen, Simon Jonas; do Canto, Luisa Matos; Rogatto, Silvia Regina; Baumbach, Jan

2018-05-18

Copy number variations (CNVs) are large segments of the genome that are duplicated or deleted. Structural variations in the genome have been linked to many complex diseases. Similar to how genome-wide association studies (GWAS) have helped discover single-nucleotide polymorphisms linked to disease phenotypes, the extension of GWAS to CNVs has aided the discovery of structural variants associated with human traits and diseases. We present CoNVaQ, an easy-to-use web-based tool for CNV-based association studies. The web service allows users to upload two sets of CNV segments and search for genomic regions where the occurrence of CNVs is significantly associated with the phenotype. CoNVaQ provides two models: a simple statistical model using Fisher's exact test and a novel query-based model matching regions to user-defined queries. For each region, the method computes a global q-value statistic by repeated permutation of samples among the populations. We demonstrate our platform by using it to analyze a data set of HPV-positive and HPV-negative penile cancer patients. CoNVaQ provides a simple workflow for performing CNV-based association studies. It is made available as a web platform in order to provide a user-friendly workflow for biologists and clinicians to carry out CNV data analysis without installing any software. Through the web interface, users are also able to analyze their results to find overrepresented GO terms and pathways. In addition, our method is also available as a package for the R programming language. CoNVaQ is available at https://convaq.compbio.sdu.dk .
Extending Climate Analytics-As to the Earth System Grid Federation

NASA Astrophysics Data System (ADS)

Tamkin, G.; Schnase, J. L.; Duffy, D.; McInerney, M.; Nadeau, D.; Li, J.; Strong, S.; Thompson, J. H.

2015-12-01

We are building three extensions to prior-funded work on climate analytics-as-a-service that will benefit the Earth System Grid Federation (ESGF) as it addresses the Big Data challenges of future climate research: (1) We are creating a cloud-based, high-performance Virtual Real-Time Analytics Testbed supporting a select set of climate variables from six major reanalysis data sets. This near real-time capability will enable advanced technologies like the Cloudera Impala-based Structured Query Language (SQL) query capabilities and Hadoop-based MapReduce analytics over native NetCDF files while providing a platform for community experimentation with emerging analytic technologies. (2) We are building a full-featured Reanalysis Ensemble Service comprising monthly means data from six reanalysis data sets. The service will provide a basic set of commonly used operations over the reanalysis collections. The operations will be made accessible through NASA's climate data analytics Web services and our client-side Climate Data Services (CDS) API. (3) We are establishing an Open Geospatial Consortium (OGC) WPS-compliant Web service interface to our climate data analytics service that will enable greater interoperability with next-generation ESGF capabilities. The CDS API will be extended to accommodate the new WPS Web service endpoints as well as ESGF's Web service endpoints. These activities address some of the most important technical challenges for server-side analytics and support the research community's requirements for improved interoperability and improved access to reanalysis data.
SAFOD Brittle Microstructure and Mechanics Knowledge Base (SAFOD BM2KB)

NASA Astrophysics Data System (ADS)

Babaie, H. A.; Hadizadeh, J.; di Toro, G.; Mair, K.; Kumar, A.

2008-12-01

We have developed a knowledge base to store and present the data collected by a group of investigators studying the microstructures and mechanics of brittle faulting using core samples from the SAFOD (San Andreas Fault Observatory at Depth) project. The investigations are carried out with a variety of analytical and experimental methods primarily to better understand the physics of strain localization in fault gouge. The knowledge base instantiates an specially-designed brittle rock deformation ontology developed at Georgia State University. The inference rules embedded in the semantic web languages, such as OWL, RDF, and RDFS, which are used in our ontology, allow the Pellet reasoner used in this application to derive additional truths about the ontology and knowledge of this domain. Access to the knowledge base is via a public website, which is designed to provide the knowledge acquired by all the investigators involved in the project. The stored data will be products of studies such as: experiments (e.g., high-velocity friction experiment), analyses (e.g., microstructural, chemical, mass transfer, mineralogical, surface, image, texture), microscopy (optical, HRSEM, FESEM, HRTEM]), tomography, porosity measurement, microprobe, and cathodoluminesence. Data about laboratories, experimental conditions, methods, assumptions, equipments, and mechanical properties and lithology of the studied samples will also be presented on the website per investigation. The ontology was modeled applying the UML (Unified Modeling Language) in Rational Rose, and implemented in OWL-DL (Ontology Web Language) using the Protégé ontology editor. The UML model was converted to OWL-DL by first mapping it to Ecore (.ecore) and Generator model (.genmodel) with the help of the EMF (Eclipse Modeling Framework) plugin in Eclipse. The Ecore model was then mapped to a .uml file, which later was converted into an .owl file and subsequently imported into the Protégé ontology editing environment. The web-interface was developed in java using eclipse as the IDE. The web interfaces to query and submit data were implemented applying JSP, servlets, javascript, and AJAX. The Jena API, a Java framework for building Semantic Web applications, was used to develop the web-interface. Jena provided a programmatic environment for RDF, RDFS, OWL, and SPARQL query engine. Building web applications with AJAX helps retrieving data from the server asynchronously in the background without interfering with the display and behavior of the existing page. The application was deployed on an apache tomcat server at GSU. The SAFOD BM2KB website provides user-friendly search, submit, feedback, and other services. The General Search option allows users to search the knowledge base by selecting the classes (e.g., Experiment, Surface Analysis), their respective attributes (e.g., apparatus, date performed), and the relationships to other classes (e.g., Sample, Laboratory). The Search by Sample option allows users to search the knowledge base based on sample number. The Search by Investigator lets users to search the knowledge base by choosing an investigator who is involved in this project. The website also allows users to submit new data. The Submit Data option opens a page where users can submit the SAFOD data to our knowledge base by selecting specific classes and attributes. The submitted data then become available for query as part of the knowledge base. The SAFOD BM2KB can be accessed from the main SAFOD website.
PyGPlates - a GPlates Python library for data analysis through space and deep geological time

NASA Astrophysics Data System (ADS)

Williams, Simon; Cannon, John; Qin, Xiaodong; Müller, Dietmar

2017-04-01

A fundamental consideration for studying the Earth through deep time is that the configurations of the continents, tectonic plates, and plate boundaries are continuously changing. Within a diverse range of fields including geodynamics, paleoclimate, and paleobiology, the importance of considering geodata in their reconstructed context across previous cycles of supercontinent aggregation, dispersal and ocean basin evolution is widely recognised. Open-source software tools such as GPlates provide paleo-geographic information systems for geoscientists to combine a wide variety of geodata and examine them within tectonic reconstructions through time. The availability of such powerful tools also brings new challenges - we want to learn something about the key associations between reconstructed plate motions and the geological record, but the high-dimensional parameter space is difficult for a human being to visually comprehend and quantify these associations. To achieve true spatio-temporal data-mining, new tools are needed. Here, we present a further development of the GPlates ecosystem - a Python-based tool for geotectonic analysis. In contrast to existing GPlates tools that are built around a graphical user interface (GUI) and interactive visualisation, pyGPlates offers a programming interface for the automation of quantitative plate tectonic analysis or arbitrary complexity. The vast array of open-source Python-based tools for data-mining, statistics and machine learning can now be linked to pyGPlates, allowing spatial data to be seamlessly analysed in space and geological "deep time", and with the ability to spread large computations across multiple processors. The presentation will illustrate a range of example applications, both simple and advanced. Basic examples include data querying, filtering, and reconstruction, and file-format conversions. For the innovative study of plate kinematics, pyGPlates has been used to explore the relationships between absolute plate motions, subduction zone kinematics, and mid-ocean ridge migration and orientation through deep time; to investigate the systematics of continental rift velocity evolution during Pangea breakup; and to make connections between kinematics of the Andean subduction zone and ore deposit formation. To support the numerical modelling community, pyGPlates facilitates the connection between tectonic surface boundary conditions contained within plate tectonic reconstructions (plate boundary configurations and plate velocities) and simulations such as thermo-mechanical models of lithospheric deformation and mantle convection. To support the development of web-based applications that can serve the wider geoscience community, we will demonstrate how pyGPlates can be combined with other open-source tools to serve alternative reconstructions together with a diverse array of reconstructed data sets in a self-consistent framework over the internet. PyGPlates is available to the public via the GPlates web site and contains comprehensive documentation covering installation on Windows/Mac/Linux platforms, sample code, tutorials and a detailed reference of pyGPlates functions and classes.
TCW: Transcriptome Computational Workbench

PubMed Central

Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.

2013-01-01

Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959
TCW: transcriptome computational workbench.

PubMed

Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

2013-01-01

The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.
ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature

PubMed Central

McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry BF; Tipton, Keith F

2007-01-01

Background We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. Description The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at . The data are available for download as SQL and XML files via FTP. Conclusion ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List. PMID:17662133
ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature.

PubMed

McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry B F; Tipton, Keith F

2007-07-27

We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at http://www.enzyme-database.org. The data are available for download as SQL and XML files via FTP. ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List.
The NOAO Data Lab PHAT Photometry Database

NASA Astrophysics Data System (ADS)

Olsen, Knut; Williams, Ben; Fitzpatrick, Michael; PHAT Team

2018-01-01

We present a database containing both the combined photometric object catalog and the single epoch measurements from the Panchromatic Hubble Andromeda Treasury (PHAT). This database is hosted by the NOAO Data Lab (http://datalab.noao.edu), and as such exposes a number of data services to the PHAT photometry, including access through a Table Access Protocol (TAP) service, direct PostgreSQL queries, web-based and programmatic query interfaces, remote storage space for personal database tables and files, and a JupyterHub-based Notebook analysis environment, as well as image access through a Simple Image Access (SIA) service. We show how the Data Lab database and Jupyter Notebook environment allow for straightforward and efficient analyses of PHAT catalog data, including maps of object density, depth, and color, extraction of light curves of variable objects, and proper motion exploration.
Designing Websites for Displaying Large Data Sets and Images on Multiple Platforms

NASA Astrophysics Data System (ADS)

Anderson, A.; Wolf, V. G.; Garron, J.; Kirschner, M.

2012-12-01

The desire to build websites to analyze and display ever increasing amounts of scientific data and images pushes for web site designs which utilize large displays, and to use the display area as efficiently as possible. Yet, scientists and users of their data are increasingly wishing to access these websites in the field and on mobile devices. This results in the need to develop websites that can support a wide range of devices and screen sizes, and to optimally use whatever display area is available. Historically, designers have addressed this issue by building two websites; one for mobile devices, and one for desktop environments, resulting in increased cost, duplicity of work, and longer development times. Recent advancements in web design technology and techniques have evolved which allow for the development of a single website that dynamically adjusts to the type of device being used to browse the website (smartphone, tablet, desktop). In addition they provide the opportunity to truly optimize whatever display area is available. HTML5 and CSS3 give web designers media query statements which allow design style sheets to be aware of the size of the display being used, and to format web content differently based upon the queried response. Web elements can be rendered in a different size, position, or even removed from the display entirely, based upon the size of the display area. Using HTML5/CSS3 media queries in this manner is referred to as "Responsive Web Design" (RWD). RWD in combination with technologies such as LESS and Twitter Bootstrap allow the web designer to build web sites which not only dynamically respond to the browser display size being used, but to do so in very controlled and intelligent ways, ensuring that good layout and graphic design principles are followed while doing so. At the University of Alaska Fairbanks, the Alaska Satellite Facility SAR Data Center (ASF) recently redesigned their popular Vertex application and converted it from a traditional, fixed-layout website into a RWD site built on HTML5, LESS and Twitter Bootstrap. Vertex is a data portal for remotely sensed imagery of the earth, offering Synthetic Aperture Radar (SAR) data products from the global ASF archive. By using Responsive Web Design, ASF is able to provide access to a massive collection of SAR imagery and allow the user to use mobile devices and desktops to maximum advantage. ASF's Vertex web site demonstrates that with increased interface flexibility, scientists, managers and users can increase their personal effectiveness by accessing data portals from their preferred device as their science dictates.
Public Release of Pan-STARRS Data

NASA Astrophysics Data System (ADS)

Flewelling, Heather; Consortium, panstarrs

2015-08-01

Pan-STARRS 1 is a 1.8 meter survey telescope, located on Haleakala, Hawaii, with a 1.4 Gigapixel camera, a 7 square degree field of view, and 5 filters (g,r,i,z,y). The public release of data, which is available to everyone, consists of 4 years of data taken between May 2010 and April 2014. Two of the surveys available in the public release are the 3pi survey and the Medium Deep (MD) survey. The 3pi survey has roughly 60 epochs (12 per filter) covering 3/4 of the sky and everything north of -30 degrees declination. The MD survey consists of 10 fields, observed in a couple of filters each night, usually 8 exposures per filter per field, for about 4000 epochs per MD field. The available data product are accessed through the “Postage Stamp Server” and through the Published Science Products Subsystem (PSPS), both of these are available through the Pan-STARRS Science Interface (PSI). The Postage Stamp Server provides images and catalogs for different stages of processing on single exposures, stack images, difference images, and forced photometry. The PSPS is a SQLServer database that can be queried via script or web interface, with a database for each MD field and a large database for the 3pi survey. This database has relative photometry and astrometry and object associations, making it easy to do searches across the entire sky as well as tools to generate lightcurves of individual objects as a function of time.
Executing SPARQL Queries over the Web of Linked Data

NASA Astrophysics Data System (ADS)

Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.
C-State: an interactive web app for simultaneous multi-gene visualization and comparative epigenetic pattern search.

PubMed

Sowpati, Divya Tej; Srivastava, Surabhi; Dhawan, Jyotsna; Mishra, Rakesh K

2017-09-13

Comparative epigenomic analysis across multiple genes presents a bottleneck for bench biologists working with NGS data. Despite the development of standardized peak analysis algorithms, the identification of novel epigenetic patterns and their visualization across gene subsets remains a challenge. We developed a fast and interactive web app, C-State (Chromatin-State), to query and plot chromatin landscapes across multiple loci and cell types. C-State has an interactive, JavaScript-based graphical user interface and runs locally in modern web browsers that are pre-installed on all computers, thus eliminating the need for cumbersome data transfer, pre-processing and prior programming knowledge. C-State is unique in its ability to extract and analyze multi-gene epigenetic information. It allows for powerful GUI-based pattern searching and visualization. We include a case study to demonstrate its potential for identifying user-defined epigenetic trends in context of gene expression profiles.
Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots

PubMed Central

Xayaphoummine, A.; Bucher, T.; Isambert, H.

2005-01-01

The Kinefold web server provides a web interface for stochastic folding simulations of nucleic acids on second to minute molecular time scales. Renaturation or co-transcriptional folding paths are simulated at the level of helix formation and dissociation in agreement with the seminal experimental results. Pseudoknots and topologically ‘entangled’ helices (i.e. knots) are efficiently predicted taking into account simple geometrical and topological constraints. To encourage interactivity, simulations launched as immediate jobs are automatically stopped after a few seconds and return adapted recommendations. Users can then choose to continue incomplete simulations using the batch queuing system or go back and modify suggested options in their initial query. Detailed output provide (i) a series of low free energy structures, (ii) an online animated folding path and (iii) a programmable trajectory plot focusing on a few helices of interest to each user. The service can be accessed at . PMID:15980546
Web based tools for visualizing imaging data and development of XNATView, a zero footprint image viewer

PubMed Central

Gutman, David A.; Dunn, William D.; Cobb, Jake; Stoner, Richard M.; Kalpathy-Cramer, Jayashree; Erickson, Bradley

2014-01-01

Advances in web technologies now allow direct visualization of imaging data sets without necessitating the download of large file sets or the installation of software. This allows centralization of file storage and facilitates image review and analysis. XNATView is a light framework recently developed in our lab to visualize DICOM images stored in The Extensible Neuroimaging Archive Toolkit (XNAT). It consists of a PyXNAT-based framework to wrap around the REST application programming interface (API) and query the data in XNAT. XNATView was developed to simplify quality assurance, help organize imaging data, and facilitate data sharing for intra- and inter-laboratory collaborations. Its zero-footprint design allows the user to connect to XNAT from a web browser, navigate through projects, experiments, and subjects, and view DICOM images with accompanying metadata all within a single viewing instance. PMID:24904399

Jflow: a workflow management system for web applications.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe

2016-02-01

Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
xQTL workbench: a scalable web environment for multi-level QTL analysis.

PubMed

Arends, Danny; van der Velde, K Joeri; Prins, Pjotr; Broman, Karl W; Möller, Steffen; Jansen, Ritsert C; Swertz, Morris A

2012-04-01

xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. m.a.swertz@rug.nl.
xQTL workbench: a scalable web environment for multi-level QTL analysis

PubMed Central

Arends, Danny; van der Velde, K. Joeri; Prins, Pjotr; Broman, Karl W.; Möller, Steffen; Jansen, Ritsert C.; Swertz, Morris A.

2012-01-01

Summary: xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. Availability: xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. Contact: m.a.swertz@rug.nl PMID:22308096
3D-SURFER: software for high-throughput protein surface comparison and analysis

PubMed Central

La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke

2009-01-01

Summary: We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. Availability: 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19759195
3D-SURFER: software for high-throughput protein surface comparison and analysis.

PubMed

La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke

2009-11-01

We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer dkihara@purdue.edu Supplementary data are available at Bioinformatics online.
International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data.

PubMed

Zhang, Junjun; Baran, Joachim; Cros, A; Guberman, Jonathan M; Haider, Syed; Hsu, Jack; Liang, Yong; Rivkin, Elena; Wang, Jianxin; Whitty, Brett; Wong-Erasmus, Marie; Yao, Long; Kasprzyk, Arek

2011-01-01

The International Cancer Genome Consortium (ICGC) is a collaborative effort to characterize genomic abnormalities in 50 different cancer types. To make this data available, the ICGC has created the ICGC Data Portal. Powered by the BioMart software, the Data Portal allows each ICGC member institution to manage and maintain its own databases locally, while seamlessly presenting all the data in a single access point for users. The Data Portal currently contains data from 24 cancer projects, including ICGC, The Cancer Genome Atlas (TCGA), Johns Hopkins University, and the Tumor Sequencing Project. It consists of 3478 genomes and 13 cancer types and subtypes. Available open access data types include simple somatic mutations, copy number alterations, structural rearrangements, gene expression, microRNAs, DNA methylation and exon junctions. Additionally, simple germline variations are available as controlled access data. The Data Portal uses a web-based graphical user interface (GUI) to offer researchers multiple ways to quickly and easily search and analyze the available data. The web interface can assist in constructing complicated queries across multiple data sets. Several application programming interfaces are also available for programmatic access. Here we describe the organization, functionality, and capabilities of the ICGC Data Portal.
ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.

PubMed

Kamdar, Maulik R; Zeginis, Dimitris; Hasnain, Ali; Decker, Stefan; Deus, Helena F

2014-02-01

Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Interactive web-based mapping: bridging technology and data for health.

PubMed

Highfield, Linda; Arthasarnprasit, Jutas; Ottenweller, Cecelia A; Dasprez, Arnaud

2011-12-23

The Community Health Information System (CHIS) online mapping system was first launched in 1998. Its overarching goal was to provide researchers, residents and organizations access to health related data reflecting the overall health and well-being of their communities within the Greater Houston area. In September 2009, initial planning and development began for the next generation of CHIS. The overarching goal for the new version remained to make health data easily accessible for a wide variety of research audiences. However, in the new version we specifically sought to make the CHIS truly interactive and give the user more control over data selection and reporting. In July 2011, a beta version of the next-generation of the application was launched. This next-generation is also a web based interactive mapping tool comprised of two distinct portals: the Breast Health Portal and Project Safety Net. Both are accessed via a Google mapping interface. Geographic coverage for the portals is currently an 8 county region centered on Harris County, Texas. Data accessed by the application include Census 2000, Census 2010 (underway), cancer incidence from the Texas Cancer Registry (TX Dept. of State Health Services), death data from Texas Vital Statistics, clinic locations for free and low-cost health services, along with service lists, hours of operation, payment options and languages spoken, uninsured and poverty data. The system features query on the fly technology, which means the data is not generated until the query is provided to the system. This allows users to interact in real-time with the databases and generate customized reports and maps. To the author's knowledge, the Breast Health Portal and Project Safety Net are the first local-scale interactive online mapping interfaces for public health data which allow users to control the data generated. For example, users may generate breast cancer incidence rates by Census tract, in real time, for women aged 40-64. Conversely, they could then generate the same rates for women aged 35-55. The queries are user controlled.
SIFTER search: a web server for accurate phylogeny-based protein function prediction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
SIFTER search: a web server for accurate phylogeny-based protein function prediction

DOE PAGES

Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

2015-05-15

We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
MyWEST: my Web Extraction Software Tool for effective mining of annotations from web-based databanks.

PubMed

Masseroli, Marco; Stella, Andrea; Meani, Natalia; Alcalay, Myriam; Pinciroli, Francesco

2004-12-12

High-throughput technologies create the necessity to mine large amounts of gene annotations from diverse databanks, and to integrate the resulting data. Most databanks can be interrogated only via Web, for a single gene at a time, and query results are generally available only in the HTML format. Although some databanks provide batch retrieval of data via FTP, this requires expertise and resources for locally reimplementing the databank. We developed MyWEST, a tool aimed at researchers without extensive informatics skills or resources, which exploits user-defined templates to easily mine selected annotations from different Web-interfaced databanks, and aggregates and structures results in an automatically updated database. Using microarray results from a model system of retinoic acid-induced differentiation, MyWEST effectively gathered relevant annotations from various biomolecular databanks, highlighted significant biological characteristics and supported a global approach to the understanding of complex cellular mechanisms. MyWEST is freely available for non-profit use at http://www.medinfopoli.polimi.it/MyWEST/
WebVR: an interactive web browser for virtual environments

NASA Astrophysics Data System (ADS)

Barsoum, Emad; Kuester, Falko

2005-03-01

The pervasive nature of web-based content has lead to the development of applications and user interfaces that port between a broad range of operating systems and databases, while providing intuitive access to static and time-varying information. However, the integration of this vast resource into virtual environments has remained elusive. In this paper we present an implementation of a 3D Web Browser (WebVR) that enables the user to search the internet for arbitrary information and to seamlessly augment this information into virtual environments. WebVR provides access to the standard data input and query mechanisms offered by conventional web browsers, with the difference that it generates active texture-skins of the web contents that can be mapped onto arbitrary surfaces within the environment. Once mapped, the corresponding texture functions as a fully integrated web-browser that will respond to traditional events such as the selection of links or text input. As a result, any surface within the environment can be turned into a web-enabled resource that provides access to user-definable data. In order to leverage from the continuous advancement of browser technology and to support both static as well as streamed content, WebVR uses ActiveX controls to extract the desired texture skin from industry strength browsers, providing a unique mechanism for data fusion and extensibility.
Implementation of an EPN-TAP Service to Improve Accessibility to the Planetary Science Archive

NASA Astrophysics Data System (ADS)

Macfarlane, A.; Barabarisi, I.; Docasal, R.; Rios, C.; Saiz, J.; Vallejo, F.; Martinez, S.; Arviset, C.; Besse, S.; Vallat, C.

2017-09-01

The re-engineered PSA has a focus on improved access and search-ability to ESA's planetary science data. In addition to the new web interface released in January 2017, the new PSA supports several common planetary protocols in order to increase the visibility and ways in which the data may be queried and retrieved. Work is on-going to provide an EPN-TAP service covering as wide a range of parameters as possible to facilitate the discovery of scientific data and interoperability of the archive.
Issues in the design of a pilot concept-based query interface for the neuroinformatics information framework.

PubMed

Marenco, Luis; Li, Yuli; Martone, Maryann E; Sternberg, Paul W; Shepherd, Gordon M; Miller, Perry L

2008-09-01

This paper describes a pilot query interface that has been constructed to help us explore a "concept-based" approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface.
Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

PubMed Central

Li, Yuli; Martone, Maryann E.; Sternberg, Paul W.; Shepherd, Gordon M.; Miller, Perry L.

2009-01-01

This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface. PMID:18953674
New Web Services for Broader Access to National Deep Submergence Facility Data Resources Through the Interdisciplinary Earth Data Alliance

NASA Astrophysics Data System (ADS)

Ferrini, V. L.; Grange, B.; Morton, J. J.; Soule, S. A.; Carbotte, S. M.; Lehnert, K.

2016-12-01

The National Deep Submergence Facility (NDSF) operates the Human Occupied Vehicle (HOV) Alvin, the Remotely Operated Vehicle (ROV) Jason, and the Autonomous Underwater Vehicle (AUV) Sentry. These vehicles are deployed throughout the global oceans to acquire sensor data and physical samples for a variety of interdisciplinary science programs. As part of the EarthCube Integrative Activity Alliance Testbed Project (ATP), new web services were developed to improve access to existing online NDSF data and metadata resources. These services make use of tools and infrastructure developed by the Interdisciplinary Earth Data Alliance (IEDA) and enable programmatic access to metadata and data resources as well as the development of new service-driven user interfaces. The Alvin Frame Grabber and Jason Virtual Van enable the exploration of frame-grabbed images derived from video cameras on NDSF dives. Metadata available for each image includes time and vehicle position, data from environmental sensors, and scientist-generated annotations, and data are organized and accessible by cruise and/or dive. A new FrameGrabber web service and service-driven user interface were deployed to offer integrated access to these data resources through a single API and allows users to search across content curated in both systems. In addition, a new NDSF Dive Metadata web service and service-driven user interface was deployed to provide consolidated access to basic information about each NDSF dive (e.g. vehicle name, dive ID, location, etc), which is important for linking distributed data resources curated in different data systems.
EquiX-A Search and Query Language for XML.

ERIC Educational Resources Information Center

Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

2002-01-01

Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
High-performance web services for querying gene and variant annotation.

PubMed

Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

2016-05-06

Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.
Syndromic surveillance models using Web data: the case of scarlet fever in the UK.

PubMed

Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel

2012-03-01

Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.
Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

PubMed Central

Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

2011-01-01

Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308

Improved Information Retrieval Performance on SQL Database Using Data Adapter

NASA Astrophysics Data System (ADS)

Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.

2018-02-01

The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.
Development of a medical module for disaster information systems.

PubMed

Calik, Elif; Atilla, Rıdvan; Kaya, Hilal; Aribaş, Alirıza; Cengiz, Hakan; Dicle, Oğuz

2014-01-01

This study aims to improve a medical module which provides a real-time medical information flow about pre-hospital processes that gives health care in disasters; transferring, storing and processing the records that are in electronic media and over internet as a part of disaster information systems. In this study which is handled within the frame of providing information flow among professionals in a disaster case, to supply the coordination of healthcare team and transferring complete information to specified people at real time, Microsoft Access database and SQL query language were used to inform database applications. System was prepared on Microsoft .Net platform using C# language. Disaster information system-medical module was designed to be used in disaster area, field hospital, nearby hospitals, temporary inhabiting areas like tent city, vehicles that are used for dispatch, and providing information flow between medical officials and data centres. For fast recording of the disaster victim data, accessing to database which was used by health care professionals was provided (or granted) among analysing process steps and creating minimal datasets. Database fields were created in the manner of giving opportunity to enter new data and search old data which is recorded before disaster. Web application which provides access such as data entry to the database and searching towards the designed interfaces according to the login credentials access level. In this study, homepage and users' interfaces which were built on database in consequence of system analyses were provided with www.afmedinfo.com web site to the user access. With this study, a recommendation was made about how to use disaster-based information systems in the field of health. Awareness has been developed about the fact that disaster information system should not be perceived only as an early warning system. Contents and the differences of the health care practices of disaster information systems were revealed. A web application was developed supplying a link between the user and the database to make date entry and data query practices by the help of the developed interfaces.
A novel visualization model for web search results.

PubMed

Nguyen, Tien N; Zhang, Jin

2006-01-01

This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.
Flexible querying of Web data to simulate bacterial growth in food.

PubMed

Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Hignette, Gaëlle; Mettler, Eric; Soler, Lydie

2011-06-01

A preliminary step in microbial risk assessment in foods is the gathering of experimental data. In the framework of the Sym'Previus project, we have designed a complete data integration system opened on the Web which allows a local database to be complemented by data extracted from the Web and annotated using a domain ontology. We focus on the Web data tables as they contain, in general, a synthesis of data published in the documents. We propose in this paper a flexible querying system using the domain ontology to scan simultaneously local and Web data, this in order to feed the predictive modeling tools available on the Sym'Previus platform. Special attention is paid on the way fuzzy annotations associated with Web data are taken into account in the querying process, which is an important and original contribution of the proposed system. Copyright © 2010 Elsevier Ltd. All rights reserved.
HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

PubMed

Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

2016-10-01

High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.
Rapid Deployment of a RESTful Service for Oceanographic Research Cruises

NASA Astrophysics Data System (ADS)

Fu, Linyun; Arko, Robert; Leadbetter, Adam

2014-05-01

The Ocean Data Interoperability Platform (ODIP) seeks to increase data sharing across scientific domains and international boundaries, by providing a forum to harmonize diverse regional data systems. ODIP participants from the US include the Rolling Deck to Repository (R2R) program, whose mission is to capture, catalog, and describe the underway/environmental sensor data from US oceanographic research vessels and submit the data to public long-term archives. R2R publishes information online as Linked Open Data, making it widely available using Semantic Web standards. Each vessel, sensor, cruise, dataset, person, organization, funding award, log, report, etc, has a Uniform Resource Identifier (URI). Complex queries that federate results from other data providers are supported, using the SPARQL query language. To facilitate interoperability, R2R uses controlled vocabularies developed collaboratively by the science community (eg. SeaDataNet device categories) and published online by the NERC Vocabulary Server (NVS). In response to user feedback, we are developing a standard programming interface (API) and Web portal for R2R's Linked Open Data. The API provides a set of simple REST-type URLs that are translated on-the-fly into SPARQL queries, and supports common output formats (eg. JSON). We will demonstrate an implementation based on the Epimorphics Linked Data API (ELDA) open-source Java package. Our experience shows that constructing a simple portal with limited schema elements in this way can significantly reduce development time and maintenance complexity.
DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

PubMed Central

Albrecht, Felipe; List, Markus; Bock, Christoph; Lengauer, Thomas

2016-01-01

Large amounts of epigenomic data are generated under the umbrella of the International Human Epigenome Consortium, which aims to establish 1000 reference epigenomes within the next few years. These data have the potential to unravel the complexity of epigenomic regulation. However, their effective use is hindered by the lack of flexible and easy-to-use methods for data retrieval. Extracting region sets of interest is a cumbersome task that involves several manual steps: identifying the relevant experiments, downloading the corresponding data files and filtering the region sets of interest. Here we present the DeepBlue Epigenomic Data Server, which streamlines epigenomic data analysis as well as software development. DeepBlue provides a comprehensive programmatic interface for finding, selecting, filtering, summarizing and downloading region sets. It contains data from four major epigenome projects, namely ENCODE, ROADMAP, BLUEPRINT and DEEP. DeepBlue comes with a user manual, examples and a well-documented application programming interface (API). The latter is accessed via the XML-RPC protocol supported by many programming languages. To demonstrate usage of the API and to enable convenient data retrieval for non-programmers, we offer an optional web interface. DeepBlue can be openly accessed at http://deepblue.mpi-inf.mpg.de. PMID:27084938
HBVPathDB: a database of HBV infection-related molecular interaction network.

PubMed

Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi

2005-03-21

To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.
Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

PubMed Central

Sadesh, S.; Suganthe, R. C.

2015-01-01

Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626
Development of a web-based video management and application processing system

NASA Astrophysics Data System (ADS)

Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting

2001-07-01

How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.
Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries

ERIC Educational Resources Information Center

Lazarinis, Fotis

2008-01-01

Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…
Renal Gene Expression Database (RGED): a relational database of gene expression profiles in kidney disease

PubMed Central

Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo

2014-01-01

We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Availability and implementation: Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. Database URL: http://rged.wall-eva.net PMID:25252782
Renal Gene Expression Database (RGED): a relational database of gene expression profiles in kidney disease.

PubMed

Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo

2014-01-01

We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. http://rged.wall-eva.net. © The Author(s) 2014. Published by Oxford University Press.
Diamond Eye: a distributed architecture for image data mining

NASA Astrophysics Data System (ADS)

Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem

1999-02-01

Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.
Automatically Preparing Safe SQL Queries

NASA Astrophysics Data System (ADS)

Bisht, Prithvi; Sistla, A. Prasad; Venkatakrishnan, V. N.

We present the first sound program source transformation approach for automatically transforming the code of a legacy web application to employ PREPARE statements in place of unsafe SQL queries. Our approach therefore opens the way for eradicating the SQL injection threat vector from legacy web applications.
A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study.

PubMed

Del Fiol, Guilherme; Michelson, Matthew; Iorio, Alfonso; Cotoi, Chris; Haynes, R Brian

2018-06-25

A major barrier to the practice of evidence-based medicine is efficiently finding scientifically sound studies on a given clinical topic. To investigate a deep learning approach to retrieve scientifically sound treatment studies from the biomedical literature. We trained a Convolutional Neural Network using a noisy dataset of 403,216 PubMed citations with title and abstract as features. The deep learning model was compared with state-of-the-art search filters, such as PubMed's Clinical Query Broad treatment filter, McMaster's textword search strategy (no Medical Subject Heading, MeSH, terms), and Clinical Query Balanced treatment filter. A previously annotated dataset (Clinical Hedges) was used as the gold standard. The deep learning model obtained significantly lower recall than the Clinical Queries Broad treatment filter (96.9% vs 98.4%; P<.001); and equivalent recall to McMaster's textword search (96.9% vs 97.1%; P=.57) and Clinical Queries Balanced filter (96.9% vs 97.0%; P=.63). Deep learning obtained significantly higher precision than the Clinical Queries Broad filter (34.6% vs 22.4%; P<.001) and McMaster's textword search (34.6% vs 11.8%; P<.001), but was significantly lower than the Clinical Queries Balanced filter (34.6% vs 40.9%; P<.001). Deep learning performed well compared to state-of-the-art search filters, especially when citations were not indexed. Unlike previous machine learning approaches, the proposed deep learning model does not require feature engineering, or time-sensitive or proprietary features, such as MeSH terms and bibliometrics. Deep learning is a promising approach to identifying reports of scientifically rigorous clinical research. Further work is needed to optimize the deep learning model and to assess generalizability to other areas, such as diagnosis, etiology, and prognosis. ©Guilherme Del Fiol, Matthew Michelson, Alfonso Iorio, Chris Cotoi, R Brian Haynes. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.06.2018.
BAID: The Barrow Area Information Database - An Interactive Web Mapping Portal and Cyberinfrastructure Showcasing Scientific Activities in the Vicinity of Barrow, Arctic Alaska.

NASA Astrophysics Data System (ADS)

Escarzaga, S. M.; Cody, R. P.; Kassin, A.; Barba, M.; Gaylord, A. G.; Manley, W. F.; Mazza Ramsay, F. D.; Vargas, S. A., Jr.; Tarin, G.; Laney, C. M.; Villarreal, S.; Aiken, Q.; Collins, J. A.; Green, E.; Nelson, L.; Tweedie, C. E.

2015-12-01

The Barrow area of northern Alaska is one of the most intensely researched locations in the Arctic and the Barrow Area Information Database (BAID, www.barrowmapped.org) tracks and facilitates a gamut of research, management, and educational activities in the area. BAID is a cyberinfrastructure (CI) that details much of the historic and extant research undertaken within in the Barrow region in a suite of interactive web-based mapping and information portals (geobrowsers). The BAID user community and target audience for BAID is diverse and includes research scientists, science logisticians, land managers, educators, students, and the general public. BAID contains information on more than 12,000 Barrow area research sites that extend back to the 1940's and more than 640 remote sensing images and geospatial datasets. In a web-based setting, users can zoom, pan, query, measure distance, save or print maps and query results, and filter or view information by space, time, and/or other tags. Additionally, data are described with metadata that meet Federal Geographic Data Committee standards. Recent advances include the addition of more than 2000 new research sites, the addition of a query builder user interface allowing rich and complex queries, and provision of differential global position system (dGPS) and high-resolution aerial imagery support to visiting scientists. Recent field surveys include over 80 miles of coastline to document rates of erosion and the collection of high-resolution sonar data for bathymetric mapping of Elson Lagoon and near shore region of the Chukchi Sea. A network of five climate stations has been deployed across the peninsula to serve as a wireless net for the research community and to deliver near real time climatic data to the user community. Local GIS personal have also been trained to better make use of scientific data for local decision making. Links to Barrow area datasets are housed at national data archives and substantial upgrades have been made to the BAID website and web mapping applications to include the public release of a new multi-temporal Imagery Viewer that allow users to interact with and compare imagery of the Barrow area from 1949 to present.
ESTree db: a Tool for Peach Functional Genomics

PubMed Central

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-01-01

Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742
ESTree db: a tool for peach functional genomics.

PubMed

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-12-01

The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Mining the SDSS SkyServer SQL queries log

NASA Astrophysics Data System (ADS)

Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

2016-05-01

SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

WebEQ: a web-GIS System to collect, display and query data for the management of the earthquake emergency in Central Italy

NASA Astrophysics Data System (ADS)

Carbone, Gianluca; Cosentino, Giuseppe; Pennica, Francesco; Moscatelli, Massimiliano; Stigliano, Francesco

2017-04-01

After the strong earthquakes that hit central Italy in recent months, the Center for Seismic Microzonation and its applications (CentroMS) was commissioned by the Italian Department of Civil Protection to conduct the study of seismic microzonation of the territories affected by the earthquake of August 24, 2016. As part of the activities of microzonation, IGAG CNR has created WebEQ, a management tool of the data that have been acquired by all participants (i.e., more than twenty research institutes and university departments). The data collection was organized and divided into sub-areas, assigned to working groups with multidisciplinary expertise in geology, geophysics and engineering. WebEQ is a web-GIS System that helps all the subjects involved in the data collection activities, through tools aimed at data uploading and validation, and with a simple GIS interface to display, query and download geographic data. WebEQ is contributing to the creation of a large database containing geographical data, both vector and raster, from various sources and types: - Regional Technical Map em Geological and geomorphological maps em Data location maps em Maps of microzones homogeneous in seismic perspective and seismic microzonation maps em National strong motion network location. Data loading is done through simple input masks that ensure consistency with the database structure, avoiding possible errors and helping users to interact with the map through user-friendly tools. All the data are thematized through standardized symbologies and colors (Gruppo di lavoro MS 2008), in order to allow the easy interpretation by all users. The data download tools allow data exchange between working groups and the scientific community to benefit from the activities. The seismic microzonation activities are still ongoing. WebEQ is enabling easy management of large amounts of data and will form a basis for the development of tools for the management of the upcoming seismic emergencies.
ADF/ADC Web Tools for Browsing and Visualizing Astronomical Catalogs and NASA Astrophysics Mission Metadata

NASA Astrophysics Data System (ADS)

Shaya, E.; Kargatis, V.; Blackwell, J.; Borne, K.; White, R. A.; Cheung, C.

1998-05-01

Several new web based services have been introduced this year by the Astrophysics Data Facility (ADF) at the NASA Goddard Space Flight Center. IMPReSS is a graphical interface to astrophysics databases that presents the user with the footprints of observations of space-based missions. It also aids astronomers in retrieving these data by sending requests to distributed data archives. The VIEWER is a reader of ADC astronomical catalogs and journal tables that allows subsetting of catalogs by column choices and range selection and provides database-like search capability within each table. With it, the user can easily find the table data most appropriate for their purposes and then download either the subset table or the original table. CATSEYE is a tool that plots output tables from the VIEWER (and soon AMASE), making exploring the datasets fast and easy. Having completed the basic functionality of these systems, we are enhancing the site to provide advanced functionality. These will include: market basket storage of tables and records of VIEWER output for IMPReSS and AstroBrowse queries, non-HTML table responses to AstroBrowse type queries, general column arithmetic, modularity to allow entrance into the sequence of web pages at any point, histogram plots, navigable maps, and overplotting of catalog objects on mission footprint maps. When completed, the ADF/ADC web facilities will provide astronomical tabled data and mission retrieval information in several hyperlinked environments geared for users at any level, from the school student to the typical astronomer to the expert datamining tools at state-of-the-art data centers.
An Evaluation of the Interactive Query Expansion in an Online Library Catalogue with a Graphical User Interface.

ERIC Educational Resources Information Center

Hancock-Beaulieu, Micheline; And Others

1995-01-01

An online library catalog was used to evaluate an interactive query expansion facility based on relevance feedback for the Okapi, probabilistic, term weighting, retrieval system. A graphical user interface allowed searchers to select candidate terms extracted from relevant retrieved items to reformulate queries. Results suggested that the…
Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of Cerebrotendinous xanthomatosis.

PubMed

Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J

2012-07-31

Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.
Vocabulary services to support scientific data interoperability

NASA Astrophysics Data System (ADS)

Cox, Simon; Mills, Katie; Tan, Florence

2013-04-01

Shared vocabularies are a core element in interoperable systems. Vocabularies need to be available at run-time, and where the vocabularies are shared by a distributed community this implies the use of web technology to provide vocabulary services. Given the ubiquity of vocabularies or classifiers in systems, vocabulary services are effectively the base of the interoperability stack. In contemporary knowledge organization systems, a vocabulary item is considered a concept, with the "terms" denoting it appearing as labels. The Simple Knowledge Organization System (SKOS) formalizes this as an RDF Schema (RDFS) application, with a bridge to formal logic in Web Ontology Language (OWL). For maximum utility, a vocabulary should be made available through the following interfaces: * the vocabulary as a whole - at an ontology URI corresponding to a vocabulary document * each item in the vocabulary - at the item URI * summaries, subsets, and resources derived by transformation * through the standard RDF web API - i.e. a SPARQL endpoint * through a query form for human users. However, the vocabulary data model may be leveraged directly in a standard vocabulary API that uses the semantics provided by SKOS. SISSvoc3 [1] accomplishes this as a standard set of URI templates for a vocabulary. Any URI comforming to the template selects a vocabulary subset based on the SKOS properties, including labels (skos:prefLabel, skos:altLabel, rdfs:label) and a subset of the semantic relations (skos:broader, skos:narrower, etc). SISSvoc3 thus provides a RESTFul SKOS API to query a vocabulary, but hiding the complexity of SPARQL. It has been implemented using the Linked Data API (LDA) [2], which connects to a SPARQL endpoint. By using LDA, we also get content-negotiation, alternative views, paging, metadata and other functionality provided in a standard way. A number of vocabularies have been formalized in SKOS and deployed by CSIRO, the Australian Bureau of Meteorology (BOM) and their collaborators using SISSvoc3, including: * geologic timescale (multiple versions) * soils classification * definitions from OGC standards * geosciml vocabularies * mining commodities * hyperspectral scalars Several other agencies in Australia have adopted SISSvoc3 for their vocabularies. SISSvoc3 differs from other SKOS-based vocabulary-access APIs such as GEMET [3] and NVS [4] in that (a) the service is decoupled from the content store, (b) the service URI is independent of the content URIs This means that a SISSvoc3 interface can be deployed over any SKOS vocabulary which is available at a SPARQL endpoint. As an example, a SISSvoc3 query and presentation interface has been deployed over the NERC vocabulary service hosted by the BODC, providing a search interface which is not available natively. We use vocabulary services to populate menus in user interfaces, to support data validation, and to configure data conversion routines. Related services built on LDA have also been used as a generic registry interface, and extended for serving gazetteer information. ACKNOWLEDGEMENTS The CSIRO SISSvoc3 implementation is built using the Epimorphics ELDA platform http://code.google.com/p/elda/. We thank Jacqui Githaiga and Terry Rankine for their contributions to SISSvoc design and implementation. REFERENCES 1. SISSvoc3 Specification https://www.seegrid.csiro.au/wiki/Siss/SISSvoc30Specification 2. Linked Data API http://code.google.com/p/linked-data-api/wiki/Specification 3. GEMET https://svn.eionet.europa.eu/projects/Zope/wiki/GEMETWebServiceAPI 4. NVS 2.0 http://vocab.nerc.ac.uk/
Web page sorting algorithm based on query keyword distance relation

NASA Astrophysics Data System (ADS)

Yang, Han; Cui, Hong Gang; Tang, Hao

2017-08-01

In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.
Mission and Assets Database

NASA Technical Reports Server (NTRS)

Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang

2009-01-01

Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.
The online Tabloid Proteome: an annotated database of protein associations

PubMed Central

Turan, Demet; Tavernier, Jan

2018-01-01

Abstract A complete knowledge of the proteome can only be attained by determining the associations between proteins, along with the nature of these associations (e.g. physical contact in protein–protein interactions, participation in complex formation or different roles in the same pathway). Despite extensive efforts in elucidating direct protein interactions, our knowledge on the complete spectrum of protein associations remains limited. We therefore developed a new approach that detects protein associations from identifications obtained after re-processing of large-scale, public mass spectrometry-based proteomics data. Our approach infers protein association based on the co-occurrence of proteins across many different proteomics experiments, and provides information that is almost completely complementary to traditional direct protein interaction studies. We here present a web interface to query and explore the associations derived from this method, called the online Tabloid Proteome. The online Tabloid Proteome also integrates biological knowledge from several existing resources to annotate our derived protein associations. The online Tabloid Proteome is freely available through a user-friendly web interface, which provides intuitive navigation and data exploration options for the user at http://iomics.ugent.be/tabloidproteome. PMID:29040688
Analysis of metabolomics datasets with high-performance computing and metabolite atlases

DOE PAGES

Yao, Yushu; Sun, Terence; Wang, Tony; ...

2015-07-20

Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of information. Here, we describe the Metabolite Atlas framework and interface that provides highly-efficient, web-based access to raw mass spectrometry data in concert with assertions about chemicals detected to help address some of these challenges. This integration, by design, enables experimentalists to explore their raw data, specify and refine features annotations such that they can be leveraged formore » future experiments. Fast queries of the data through the web using SciDB, a parallelized database for high performance computing, make this process operate quickly. Furthermore, by using scripting containers, such as IPython or Jupyter, to analyze the data, scientists can utilize a wide variety of freely available graphing, statistics, and information management resources. In addition, the interfaces facilitate integration with systems biology tools to ultimately link metabolomics data with biological models.« less
In-house access to PACS images and related data through World Wide Web

NASA Astrophysics Data System (ADS)

Mascarini, Christian; Ratib, Osman M.; Trayser, Gerhard; Ligier, Yves; Appel, R. D.

1996-05-01

The development of a hospital wide PACS is in progress at the University Hospital of Geneva and several archive modules are operational since 1992. This PACS is intended for wide distribution of images to clinical wards. As the PACS project and the number of archived images grow rapidly in the hospital, it was necessary to provide an easy, more widely accessible and convenient access to the PACS database for the clinicians in the different wards and clinical units of the hospital. An innovative solution has been developed using tools such as Netscape navigator and NCSA World Wide Web server as an alternative to conventional database query and retrieval software. These tools present the advantages of providing an user interface which is the same independently of the platform being used (Mac, Windows, UNIX, ...), and an easy integration of different types of documents (text, images, ...). A strict access control has been added to this interface. It allows user identification and access rights checking, as defined by the in-house hospital information system, before allowing the navigation through patient data records.
Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

DTIC Science & Technology

2006-06-01

SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
Profile-IQ: Web-based data query system for local health department infrastructure and activities.

PubMed

Shah, Gulzar H; Leep, Carolyn J; Alexander, Dayna

2014-01-01

To demonstrate the use of National Association of County & City Health Officials' Profile-IQ, a Web-based data query system, and how policy makers, researchers, the general public, and public health professionals can use the system to generate descriptive statistics on local health departments. This article is a descriptive account of an important health informatics tool based on information from the project charter for Profile-IQ and the authors' experience and knowledge in design and use of this query system. Profile-IQ is a Web-based data query system that is based on open-source software: MySQL 5.5, Google Web Toolkit 2.2.0, Apache Commons Math library, Google Chart API, and Tomcat 6.0 Web server deployed on an Amazon EC2 server. It supports dynamic queries of National Profile of Local Health Departments data on local health department finances, workforce, and activities. Profile-IQ's customizable queries provide a variety of statistics not available in published reports and support the growing information needs of users who do not wish to work directly with data files for lack of staff skills or time, or to avoid a data use agreement. Profile-IQ also meets the growing demand of public health practitioners and policy makers for data to support quality improvement, community health assessment, and other processes associated with voluntary public health accreditation. It represents a step forward in the recent health informatics movement of data liberation and use of open source information technology solutions to promote public health.
Aggregating Queries Against Large Inventories of Remotely Accessible Data

NASA Astrophysics Data System (ADS)

Gallagher, J. H. R.; Fulker, D. W.

2016-12-01

Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.
The Physiology Constant Database of Teen-Agers in Beijing

PubMed Central

Wei-Qi, Wei; Guang-Jin, Zhu; Cheng-Li, Xu; Shao-Mei, Han; Bao-Shen, Qi; Li, Chen; Shu-Yu, Zu; Xiao-Mei, Zhou; Wen-Feng, Hu; Zheng-Guo, Zhang

2004-01-01

Physiology constants of adolescents are important to understand growing living systems and are a useful reference in clinical and epidemiological research. Until recently, physiology constants were not available in China and therefore most physiologists, physicians, and nutritionists had to use data from abroad for reference. However, the very difference between the Eastern and Western races casts doubt on the usefulness of overseas data. We have therefore created a database system to provide a repository for the storage of physiology constants of teen-agers in Beijing. The several thousands of pieces of data are now divided into hematological biochemistry, lung function, and cardiac function with all data manually checked before being transferred into the database. The database was accomplished through the development of a web interface, scripts, and a relational database. The physiology data were integrated into the relational database system to provide flexible facilities by using combinations of various terms and parameters. A web browser interface was designed for the users to facilitate their searching. The database is available on the web. The statistical table, scatter diagram, and histogram of the data are available for both anonym and user according to queries, while only the user can achieve detail, including download data and advanced search. PMID:15258669
pgRNAFinder: a web-based tool to design distance independent paired-gRNA.

PubMed

Xiong, Yuanyan; Xie, Xiaowei; Wang, Yanzhi; Ma, Wenbing; Liang, Puping; Songyang, Zhou; Dai, Zhiming

2017-11-15

The CRISPR/Cas System has been shown to be an efficient and accurate genome-editing technique. There exist a number of tools to design the guide RNA sequences and predict potential off-target sites. However, most of the existing computational tools on gRNA design are restricted to small deletions. To address this issue, we present pgRNAFinder, with an easy-to-use web interface, which enables researchers to design single or distance-free paired-gRNA sequences. The web interface of pgRNAFinder contains both gRNA search and scoring system. After users input query sequences, it searches gRNA by 3' protospacer-adjacent motif (PAM), and possible off-targets, and scores the conservation of the deleted sequences rapidly. Filters can be applied to identify high-quality CRISPR sites. PgRNAFinder offers gRNA design functionality for 8 vertebrate genomes. Furthermore, to keep pgRNAFinder open, extensible to any organism, we provide the source package for local use. The pgRNAFinder is freely available at http://songyanglab.sysu.edu.cn/wangwebs/pgRNAFinder/, and the source code and user manual can be obtained from https://github.com/xiexiaowei/pgRNAFinder. songyang@bcm.edu or daizhim@mail.sysu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories.

PubMed

Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

2017-01-01

To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.
Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

PubMed Central

Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

2017-01-01

To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239
Querying XML Data with SPARQL

NASA Astrophysics Data System (ADS)

Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
Hybrid Filtering in Semantic Query Processing

ERIC Educational Resources Information Center

Jeong, Hanjo

2011-01-01

This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…
Environmental Dataset Gateway (EDG) Search Widget

EPA Pesticide Factsheets

Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other other applications. This allows individuals to provide direct access to EPA's metadata outside the EDG interface. The EDG Search Widget makes it possible to search the EDG from another web page or application. The search widget can be included on your website by simply inserting one or two lines of code. Users can type a search term or lucene search query in the search field and retrieve a pop-up list of records that match that search.

Interactive design of generic chemical patterns.

PubMed

Schomburg, Karen T; Wetzer, Lars; Rarey, Matthias

2013-07-01

Every medicinal chemist has to create chemical patterns occasionally for querying databases, applying filters or describing functional groups. However, the representations of chemical patterns have been so far limited to languages with highly complex syntax, handicapping the application of patterns. Graphic pattern editors similar to chemical editors can facilitate the work with patterns. In this article, we review the interfaces of frequently used web search engines for chemical patterns. We take a look at pattern editing concepts of standalone chemical editors and finally present a completely new, unpublished graphical approach to pattern design, the SMARTSeditor. Copyright © 2013 Elsevier Ltd. All rights reserved.
ENGINES: exploring single nucleotide variation in entire human genomes.

PubMed

Amigo, Jorge; Salas, Antonio; Phillips, Christopher

2011-04-19

Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen. ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php. © 2011 Amigo et al; licensee BioMed Central Ltd.
PcapDB: Search Optimized Packet Capture, Version 0.1.0.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferrell, Paul; Steinfadt, Shannon

PcapDB is a packet capture system designed to optimize the captured data for fast search in the typical (network incident response) use case. The technology involved in this software has been submitted via the IDEAS system and has been filed as a provisional patent. It includes the following primary components: capture: The capture component utilizes existing capture libraries to retrieve packets from network interfaces. Once retrieved the packets are passed to additional threads for sorting into flows and indexing. The sorted flows and indexes are passed to other threads so that they can be written to disk. These components aremore » written in the C programming language. search: The search components provide a means to find relevant flows and the associated packets. A search query is parsed and represented as a search tree. Various search commands, written in C, are then used resolve this tree into a set of search results. The tree generation and search execution management components are written in python. interface: The PcapDB web interface is written in Python on the Django framework. It provides a series of pages, API's, and asynchronous tasks that allow the user to manage the capture system, perform searches, and retrieve results. Web page components are written in HTML,CSS and Javascript.« less
A comprehensive physiologically based pharmacokinetic ...

EPA Pesticide Factsheets

Published physiologically based pharmacokinetic (PBPK) models from peer-reviewed articles are often well-parameterized, thoroughly-vetted, and can be utilized as excellent resources for the construction of models pertaining to related chemicals. Specifically, chemical-specific parameters and in vivo pharmacokinetic data used to calibrate these published models can act as valuable starting points for model development of new chemicals with similar molecular structures. A knowledgebase for published PBPK-related articles was compiled to support PBPK model construction for new chemicals based on their close analogues within the knowledgebase, and a web-based interface was developed to allow users to query those close analogues. A list of 689 unique chemicals and their corresponding 1751 articles was created after analysis of 2,245 PBPK-related articles. For each model, the PMID, chemical name, major metabolites, species, gender, life stages and tissue compartments were extracted from the published articles. PaDEL-Descriptor, a Chemistry Development Kit based software, was used to calculate molecular fingerprints. Tanimoto index was implemented in the user interface as measurement of structural similarity. The utility of the PBPK knowledgebase and web-based user interface was demonstrated using two case studies with ethylbenzene and gefitinib. Our PBPK knowledgebase is a novel tool for ranking chemicals based on similarities to other chemicals associated with existi
An object-oriented approach to deploying highly configurable Web interfaces for the ATLAS experiment

NASA Astrophysics Data System (ADS)

Lange, Bruno; Maidantchik, Carmen; Pommes, Kathy; Pavani, Varlen; Arosa, Breno; Abreu, Igor

2015-12-01

The ATLAS Technical Coordination disposes of 17 Web systems to support its operation. These applications, whilst ranging from managing the process of publishing scientific papers to monitoring radiation levels in the equipment in the experimental cavern, are constantly prone to changes in requirements due to the collaborative nature of the experiment and its management. In this context, a Web framework is proposed to unify the generation of the supporting interfaces. FENCE assembles classes to build applications by making extensive use of JSON configuration files. It relies heavily on Glance, a technology that was set forth in 2003 to create an abstraction layer on top of the heterogeneous sources that store the technical coordination data. Once Glance maps out the database modeling, records can be referenced in the configuration files by wrapping unique identifiers around double enclosing brackets. The deployed content can be individually secured by attaching clearance attributes to their description thus ensuring that view/edit privileges are granted to eligible users only. The framework also provides tools for securely writing into a database. Fully HTML5-compliant multi-step forms can be generated from their JSON description to assure that the submitted data comply with a series of constraints. Input validation is carried out primarily on the server- side but, following progressive enhancement guidelines, verification might also be performed on the client-side by enabling specific markup data attributes which are then handed over to the jQuery validation plug-in. User monitoring is accomplished by thoroughly logging user requests along with any POST data. Documentation is built from the source code using the phpDocumentor tool and made readily available for developers online. Fence, therefore, speeds up the implementation of Web interfaces and reduces the response time to requirement changes by minimizing maintenance overhead.
DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets.

PubMed

Albrecht, Felipe; List, Markus; Bock, Christoph; Lengauer, Thomas

2016-07-08

Large amounts of epigenomic data are generated under the umbrella of the International Human Epigenome Consortium, which aims to establish 1000 reference epigenomes within the next few years. These data have the potential to unravel the complexity of epigenomic regulation. However, their effective use is hindered by the lack of flexible and easy-to-use methods for data retrieval. Extracting region sets of interest is a cumbersome task that involves several manual steps: identifying the relevant experiments, downloading the corresponding data files and filtering the region sets of interest. Here we present the DeepBlue Epigenomic Data Server, which streamlines epigenomic data analysis as well as software development. DeepBlue provides a comprehensive programmatic interface for finding, selecting, filtering, summarizing and downloading region sets. It contains data from four major epigenome projects, namely ENCODE, ROADMAP, BLUEPRINT and DEEP. DeepBlue comes with a user manual, examples and a well-documented application programming interface (API). The latter is accessed via the XML-RPC protocol supported by many programming languages. To demonstrate usage of the API and to enable convenient data retrieval for non-programmers, we offer an optional web interface. DeepBlue can be openly accessed at http://deepblue.mpi-inf.mpg.de. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
jSPyDB, an open source database-independent tool for data management

NASA Astrophysics Data System (ADS)

Pierro, Giuseppe Antonio; Cavallari, Francesca; Di Guida, Salvatore; Innocente, Vincenzo

2011-12-01

Nowadays, the number of commercial tools available for accessing Databases, built on Java or .Net, is increasing. However, many of these applications have several drawbacks: usually they are not open-source, they provide interfaces only with a specific kind of database, they are platform-dependent and very CPU and memory consuming. jSPyDB is a free web-based tool written using Python and Javascript. It relies on jQuery and python libraries, and is intended to provide a simple handler to different database technologies inside a local web browser. Such a tool, exploiting fast access libraries such as SQLAlchemy, is easy to install, and to configure. The design of this tool envisages three layers. The front-end client side in the local web browser communicates with a backend server. Only the server is able to connect to the different databases for the purposes of performing data definition and manipulation. The server makes the data available to the client, so that the user can display and handle them safely. Moreover, thanks to jQuery libraries, this tool supports export of data in different formats, such as XML and JSON. Finally, by using a set of pre-defined functions, users are allowed to create their customized views for a better data visualization. In this way, we optimize the performance of database servers by avoiding short connections and concurrent sessions. In addition, security is enforced since we do not provide users the possibility to directly execute any SQL statement.
R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

PubMed Central

Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

2015-01-01

The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960
The VO-Dance web application at the IA2 data center

NASA Astrophysics Data System (ADS)

Molinaro, Marco; Knapic, Cristina; Smareglia, Riccardo

2012-09-01

Italian center for Astronomical Archives (IA2, http://ia2.oats.inaf.it) is a national infrastructure project of the Italian National Institute for Astrophysics (Istituto Nazionale di AstroFisica, INAF) that provides services for the astronomical community. Besides data hosting for the Large Binocular Telescope (LBT) Corporation, the Galileo National Telescope (Telescopio Nazionale Galileo, TNG) Consortium and other telescopes and instruments, IA2 offers proprietary and public data access through user portals (both developed and mirrored) and deploys resources complying the Virtual Observatory (VO) standards. Archiving systems and web interfaces are developed to be extremely flexible about adding new instruments from other telescopes. VO resources publishing, along with data access portals, implements the International Virtual Observatory Alliance (IVOA) protocols providing astronomers with new ways of analyzing data. Given the large variety of data flavours and IVOA standards, the need for tools to easily accomplish data ingestion and data publishing arises. This paper describes the VO-Dance tool, that IA2 started developing to address VO resources publishing in a dynamical way from already existent database tables or views. The tool consists in a Java web application, potentially DBMS and platform independent, that stores internally the services' metadata and information, exposes restful endpoints to accept VO queries for these services and dynamically translates calls to these endpoints to SQL queries coherent with the published table or view. In response to the call VO-Dance translates back the database answer in a VO compliant way.
WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search.

PubMed

Riehmann, Patrick; Gruendl, Henning; Potthast, Martin; Trenkmann, Martin; Stein, Benno; Froehlich, Benno

2012-09-01

The WORDGRAPH helps writers in visually choosing phrases while writing a text. It checks for the commonness of phrases and allows for the retrieval of alternatives by means of wildcard queries. To support such queries, we implement a scalable retrieval engine, which returns high-quality results within milliseconds using a probabilistic retrieval strategy. The results are displayed as WORDGRAPH visualization or as a textual list. The graphical interface provides an effective means for interactive exploration of search results using filter techniques, query expansion, and navigation. Our observations indicate that, of three investigated retrieval tasks, the textual interface is sufficient for the phrase verification task, wherein both interfaces support context-sensitive word choice, and the WORDGRAPH best supports the exploration of a phrase's context or the underlying corpus. Our user study confirms these observations and shows that WORDGRAPH is generally the preferred interface over the textual result list for queries containing multiple wildcards.
Mars Data analysis and visualization with Marsoweb

NASA Astrophysics Data System (ADS)

Gulick, V. G.; Deardorff, D. G.

2003-04-01

Marsoweb is a collaborative web environment that has been developed for the Mars research community to better visualize and analyze Mars orbiter data. Its goal is to enable online data discovery by providing an intuitive, interactive interface to data from the Mars Global Surveyor and other orbiters. Recently Marsoweb has served a prominent role as a resource center for the site selection process for the Mars Explorer Rover 2003 missions. In addition to hosting a repository of landing site memoranda and workshop talks, it includes a Java-based interface to a variety of data maps and images. This interface enables the display and numerical querying of data, and allows data profiles to be rendered from user-drawn cross-sections. High-resolution Mars Orbiter Camera (MOC) images (currently, over 100,000) can be graphically perused; browser-based image processing tools can be used on MOC images of potential landing sites. An automated VRML atlas allows users to construct "flyovers" of their own regions-of-interest in 3D. These capabilities enable Marsoweb to be used for general global data studies, in addition to those specific to landing site selection. As of December 2002, Marsoweb has been viewed by 88,000 distinct users with a total of 3.3 million hits (801,000 page requests in all) from NASA, USGS, academia, and the general public have accessed Marsoweb. The High Resolution Imaging Experiment team for the Mars 2005 Orbiter (HiRISE, PI Alfred McEwen) plans to cast a wide net to collect targeting suggestions. Members of the general public as well as the broad Mars science community will be able to submit suggestions of high resolution imaging targets. The web-based interface for target suggestion input (HiWeb) will be based upon Marsoweb (http://marsoweb.nas.nasa.gov).
Regular paths in SparQL: querying the NCI Thesaurus.

PubMed

Detwiler, Landon T; Suciu, Dan; Brinkley, James F

2008-11-06

OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.
FASH: A web application for nucleotides sequence search.

PubMed

Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

2008-05-27

: FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).
Terminology issues in user access to Web-based medical information.

PubMed Central

McCray, A. T.; Loane, R. F.; Browne, A. C.; Bangalore, A. K.

1999-01-01

We conducted a study of user queries to the National Library of Medicine Web site over a three month period. Our purpose was to study the nature and scope of these queries in order to understand how to improve users' access to the information they are seeking on our site. The results show that the queries are primarily medical in content (94%), with only a small percentage (5.5%) relating to library services, and with a very small percentage (.5%) not being medically relevant at all. We characterize the data set, and conclude with a discussion of our plans to develop a UMLS-based terminology server to assist NLM Web users. Images Figure 1 PMID:10566330
Small numbers, disclosure risk, security, and reliability issues in Web-based data query systems.

PubMed

Rudolph, Barbara A; Shah, Gulzar H; Love, Denise

2006-01-01

This article describes the process for developing consensus guidelines and tools for releasing public health data via the Web and highlights approaches leading agencies have taken to balance disclosure risk with public dissemination of reliable health statistics. An agency's choice of statistical methods for improving the reliability of released data for Web-based query systems is based upon a number of factors, including query system design (dynamic analysis vs preaggregated data and tables), population size, cell size, data use, and how data will be supplied to users. The article also describes those efforts that are necessary to reduce the risk of disclosure of an individual's protected health information.
A web-based 3D geological information visualization system

NASA Astrophysics Data System (ADS)

Song, Renbo; Jiang, Nan

2013-03-01

Construction of 3D geological visualization system has attracted much more concern in GIS, computer modeling, simulation and visualization fields. It not only can effectively help geological interpretation and analysis work, but also can it can help leveling up geosciences professional education. In this paper, an applet-based method was introduced for developing a web-based 3D geological information visualization system. The main aims of this paper are to explore a rapid and low-cost development method for constructing a web-based 3D geological system. First, the borehole data stored in Excel spreadsheets was extracted and then stored in SQLSERVER database of a web server. Second, the JDBC data access component was utilized for providing the capability of access the database. Third, the user interface was implemented with applet component embedded in JSP page and the 3D viewing and querying functions were implemented with PickCanvas of Java3D. Last, the borehole data acquired from geological survey were used for test the system, and the test results has shown that related methods of this paper have a certain application values.
CartograTree: connecting tree genomes, phenotypes and environment.

PubMed

Vasquez-Gross, Hans A; Yu, John J; Figueroa, Ben; Gessler, Damian D G; Neale, David B; Wegrzyn, Jill L

2013-05-01

Today, researchers spend a tremendous amount of time gathering, formatting, filtering and visualizing data collected from disparate sources. Under the umbrella of forest tree biology, we seek to provide a platform and leverage modern technologies to connect biotic and abiotic data. Our goal is to provide an integrated web-based workspace that connects environmental, genomic and phenotypic data via geo-referenced coordinates. Here, we connect the genomic query web-based workspace, DiversiTree and a novel geographical interface called CartograTree to data housed on the TreeGenes database. To accomplish this goal, we implemented Simple Semantic Web Architecture and Protocol to enable the primary genomics database, TreeGenes, to communicate with semantic web services regardless of platform or back-end technologies. The novelty of CartograTree lies in the interactive workspace that allows for geographical visualization and engagement of high performance computing (HPC) resources. The application provides a unique tool set to facilitate research on the ecology, physiology and evolution of forest tree species. CartograTree can be accessed at: http://dendrome.ucdavis.edu/cartogratree. © 2013 Blackwell Publishing Ltd.
CDAO-Store: Ontology-driven Data Integration for Phylogenetic Analysis

PubMed Central

2011-01-01

Background The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Results Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. Conclusions CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore. PMID:21496247
CDAO-store: ontology-driven data integration for phylogenetic analysis.

PubMed

Chisham, Brandon; Wright, Ben; Le, Trung; Son, Tran Cao; Pontelli, Enrico

2011-04-15

The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore.
Automated Rocket Propulsion Test Management

NASA Technical Reports Server (NTRS)

Walters, Ian; Nelson, Cheryl; Jones, Helene

2007-01-01

The Rocket Propulsion Test-Automated Management System provides a central location for managing activities associated with Rocket Propulsion Test Management Board, National Rocket Propulsion Test Alliance, and the Senior Steering Group business management activities. A set of authorized users, both on-site and off-site with regard to Stennis Space Center (SSC), can access the system through a Web interface. Web-based forms are used for user input with generation and electronic distribution of reports easily accessible. Major functions managed by this software include meeting agenda management, meeting minutes, action requests, action items, directives, and recommendations. Additional functions include electronic review, approval, and signatures. A repository/library of documents is available for users, and all items are tracked in the system by unique identification numbers and status (open, closed, percent complete, etc.). The system also provides queries and version control for input of all items.

A natural language interface plug-in for cooperative query answering in biological databases.

PubMed

Jamil, Hasan M

2012-06-11

One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
Design and development of a web-based application for diabetes patient data management.

PubMed

Deo, S S; Deobagkar, D N; Deobagkar, Deepti D

2005-01-01

A web-based database management system developed for collecting, managing and analysing information of diabetes patients is described here. It is a searchable, client-server, relational database application, developed on the Windows platform using Oracle, Active Server Pages (ASP), Visual Basic Script (VB Script) and Java Script. The software is menu-driven and allows authorized healthcare providers to access, enter, update and analyse patient information. Graphical representation of data can be generated by the system using bar charts and pie charts. An interactive web interface allows users to query the database and generate reports. Alpha- and beta-testing of the system was carried out and the system at present holds records of 500 diabetes patients and is found useful in diagnosis and treatment. In addition to providing patient data on a continuous basis in a simple format, the system is used in population and comparative analysis. It has proved to be of significant advantage to the healthcare provider as compared to the paper-based system.
Manually Classifying User Search Queries on an Academic Library Web Site

ERIC Educational Resources Information Center

Chapman, Suzanne; Desai, Shevon; Hagedorn, Kat; Varnum, Ken; Mishra, Sonali; Piacentine, Julie

2013-01-01

The University of Michigan Library wanted to learn more about the kinds of searches its users were conducting through the "one search" search box on the Library Web site. Library staff conducted two investigations. A preliminary investigation in 2011 involved the manual review of the 100 most frequently occurring queries conducted…
Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.

ERIC Educational Resources Information Center

Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung

2001-01-01

Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…
GeoSearcher: Location-Based Ranking of Search Engine Results.

ERIC Educational Resources Information Center

Watters, Carolyn; Amoudi, Ghada

2003-01-01

Discussion of Web queries with geospatial dimensions focuses on an algorithm that assigns location coordinates dynamically to Web sites based on the URL. Describes a prototype search system that uses the algorithm to re-rank search engine results for queries with a geospatial dimension, thus providing an alternative ranking order for search engine…
Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

NASA Technical Reports Server (NTRS)

Dominick, Wayne D. (Editor); Liu, I-Hsiung

1985-01-01

The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.
FlyAtlas: database of gene expression in the tissues of Drosophila melanogaster

PubMed Central

Robinson, Scott W.; Herzyk, Pawel; Dow, Julian A. T.; Leader, David P.

2013-01-01

The FlyAtlas resource contains data on the expression of the genes of Drosophila melanogaster in different tissues (currently 25—17 adult and 8 larval) obtained by hybridization of messenger RNA to Affymetrix Drosophila Genome 2 microarrays. The microarray probe sets cover 13 250 Drosophila genes, detecting 12 533 in an unambiguous manner. The data underlying the original web application (http://flyatlas.org) have been restructured into a relational database and a Java servlet written to provide a new web interface, FlyAtlas 2 (http://flyatlas.gla.ac.uk/), which allows several additional queries. Users can retrieve data for individual genes or for groups of genes belonging to the same or related ontological categories. Assistance in selecting valid search terms is provided by an Ajax ‘autosuggest’ facility that polls the database as the user types. Searches can also focus on particular tissues, and data can be retrieved for the most highly expressed genes, for genes of a particular category with above-average expression or for genes with the greatest difference in expression between the larval and adult stages. A novel facility allows the database to be queried with a specific gene to find other genes with a similar pattern of expression across the different tissues. PMID:23203866
FlyAtlas: database of gene expression in the tissues of Drosophila melanogaster.

PubMed

Robinson, Scott W; Herzyk, Pawel; Dow, Julian A T; Leader, David P

2013-01-01

The FlyAtlas resource contains data on the expression of the genes of Drosophila melanogaster in different tissues (currently 25-17 adult and 8 larval) obtained by hybridization of messenger RNA to Affymetrix Drosophila Genome 2 microarrays. The microarray probe sets cover 13,250 Drosophila genes, detecting 12,533 in an unambiguous manner. The data underlying the original web application (http://flyatlas.org) have been restructured into a relational database and a Java servlet written to provide a new web interface, FlyAtlas 2 (http://flyatlas.gla.ac.uk/), which allows several additional queries. Users can retrieve data for individual genes or for groups of genes belonging to the same or related ontological categories. Assistance in selecting valid search terms is provided by an Ajax 'autosuggest' facility that polls the database as the user types. Searches can also focus on particular tissues, and data can be retrieved for the most highly expressed genes, for genes of a particular category with above-average expression or for genes with the greatest difference in expression between the larval and adult stages. A novel facility allows the database to be queried with a specific gene to find other genes with a similar pattern of expression across the different tissues.
Biological data integration: wrapping data and tools.

PubMed

Lacroix, Zoé

2002-06-01

Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web as well as data generated by software. We present an approach to wrapping web data sources, databases, flat files, or data generated by tools through a database view mechanism. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, second builds the expected output with respect to the virtual structure. Our wrappers are composed of a retrieval component based on an intermediate object view mechanism called search views mapping the source capabilities to attributes, and an eXtensible Markup Language (XML) engine, respectively, to perform these two tasks. The originality of the approach consists of: 1) a generic view mechanism to access seamlessly data sources with limited capabilities and 2) the ability to wrap data sources as well as the useful specific tools they may provide. Our approach has been developed and demonstrated as part of the multidatabase system supporting queries via uniform object protocol model (OPM) interfaces.
Secure web-based access to radiology: forms and databases for fast queries

NASA Astrophysics Data System (ADS)

McColl, Roderick W.; Lane, Thomas J.

2002-05-01

Currently, Web-based access to mini-PACS or similar databases commonly utilizes either JavaScript, Java applets or ActiveX controls. Many sites do not permit applets or controls or other binary objects for fear of viruses or worms sent by malicious users. In addition, the typical CGI query mechanism requires several parameters to be sent with the http GET/POST request, which may identify the patient in some way; this in unacceptable for privacy protection. Also unacceptable are pages produced by server-side scripts which can be cached by the browser, since these may also contain sensitive information. We propose a simple mechanism for access to patient information, including images, which guarantees security of information, makes it impossible to bookmark the page, or to return to the page after some defined length of time. In addition, this mechanism is simple, therefore permitting rapid access without the need to initially download an interface such as an applet or control. In addition to image display, the design of the site allows the user to view and save movies of multi-phasic data, or to construct multi-frame datasets from entire series. These capabilities make the site attractive for research purposes such as teaching file preparation.
Protein interface classification by evolutionary analysis

PubMed Central

2012-01-01

Background Distinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved. Results We present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts. Conclusions An evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org. PMID:23259833
ODI - Portal, Pipeline, and Archive (ODI-PPA): a web-based astronomical compute archive, visualization, and analysis service

NASA Astrophysics Data System (ADS)

Gopu, Arvind; Hayashi, Soichi; Young, Michael D.; Harbeck, Daniel R.; Boroson, Todd; Liu, Wilson; Kotulla, Ralf; Shaw, Richard; Henschel, Robert; Rajagopal, Jayadev; Stobie, Elizabeth; Knezek, Patricia; Martin, R. Pierre; Archbold, Kevin

2014-07-01

The One Degree Imager-Portal, Pipeline, and Archive (ODI-PPA) is a web science gateway that provides astronomers a modern web interface that acts as a single point of access to their data, and rich computational and visualization capabilities. Its goal is to support scientists in handling complex data sets, and to enhance WIYN Observatory's scientific productivity beyond data acquisition on its 3.5m telescope. ODI-PPA is designed, with periodic user feedback, to be a compute archive that has built-in frameworks including: (1) Collections that allow an astronomer to create logical collations of data products intended for publication, further research, instructional purposes, or to execute data processing tasks (2) Image Explorer and Source Explorer, which together enable real-time interactive visual analysis of massive astronomical data products within an HTML5 capable web browser, and overlaid standard catalog and Source Extractor-generated source markers (3) Workflow framework which enables rapid integration of data processing pipelines on an associated compute cluster and users to request such pipelines to be executed on their data via custom user interfaces. ODI-PPA is made up of several light-weight services connected by a message bus; the web portal built using Twitter/Bootstrap, AngularJS and jQuery JavaScript libraries, and backend services written in PHP (using the Zend framework) and Python; it leverages supercomputing and storage resources at Indiana University. ODI-PPA is designed to be reconfigurable for use in other science domains with large and complex datasets, including an ongoing offshoot project for electron microscopy data.
StarView: The object oriented design of the ST DADS user interface

NASA Technical Reports Server (NTRS)

Williams, J. D.; Pollizzi, J. A.

1992-01-01

StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.
PharmMapper 2017 update: a web server for potential drug target identification with a comprehensive target pharmacophore database.

PubMed

Wang, Xia; Shen, Yihang; Wang, Shiwei; Li, Shiliang; Zhang, Weilin; Liu, Xiaofeng; Lai, Luhua; Pei, Jianfeng; Li, Honglin

2017-07-03

The PharmMapper online tool is a web server for potential drug target identification by reversed pharmacophore matching the query compound against an in-house pharmacophore model database. The original version of PharmMapper includes more than 7000 target pharmacophores derived from complex crystal structures with corresponding protein target annotations. In this article, we present a new version of the PharmMapper web server, of which the backend pharmacophore database is six times larger than the earlier one, with a total of 23 236 proteins covering 16 159 druggable pharmacophore models and 51 431 ligandable pharmacophore models. The expanded target data cover 450 indications and 4800 molecular functions compared to 110 indications and 349 molecular functions in our last update. In addition, the new web server is united with the statistically meaningful ranking of the identified drug targets, which is achieved through the use of standard scores. It also features an improved user interface. The proposed web server is freely available at http://lilab.ecust.edu.cn/pharmmapper/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.

PubMed

Amigo, Jorge; Salas, Antonio; Phillips, Christopher; Carracedo, Angel

2008-10-10

In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 x 10(9) genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.
A study of the influence of task familiarity on user behaviors and performance with a MeSH term suggestion interface for PubMed bibliographic search.

PubMed

Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching

2013-09-01

Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs.

PubMed

White, Ryen W; Horvitz, Eric

2017-03-01

A statistical model that predicts the appearance of strong evidence of a lung carcinoma diagnosis via analysis of large-scale anonymized logs of web search queries from millions of people across the United States. To evaluate the feasibility of screening patients at risk of lung carcinoma via analysis of signals from online search activity. We identified people who issue special queries that provide strong evidence of a recent diagnosis of lung carcinoma. We then considered patterns of symptoms expressed as searches about concerning symptoms over several months prior to the appearance of the landmark web queries. We built statistical classifiers that predict the future appearance of landmark queries based on the search log signals. This was a retrospective log analysis of the online activity of millions of web searchers seeking health-related information online. Of web searchers who queried for symptoms related to lung carcinoma, some (n = 5443 of 4 813 985) later issued queries that provide strong evidence of recent clinical diagnosis of lung carcinoma and are regarded as positive cases in our analysis. Additional evidence on the reliability of these queries as representing clinical diagnoses is based on the significant increase in follow-on searches for treatments and medications for these searchers and on the correlation between lung carcinoma incidence rates and our log-based statistics. The remaining symptom searchers (n = 4 808 542) are regarded as negative cases. Performance of the statistical model for early detection from online search behavior, for different lead times, different sets of signals, and different cohorts of searchers stratified by potential risk. The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms. Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.
Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of cerebrotendinous xanthomatosis

PubMed Central

2012-01-01

Background Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Methods Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. Conclusions This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research. PMID:22849591
Modern Technologies aspects for Oceanographic Data Management and Dissemination : The HNODC Implementation

NASA Astrophysics Data System (ADS)

Lykiardopoulos, A.; Iona, A.; Lakes, V.; Batis, A.; Balopoulos, E.

2009-04-01

The development of new technologies for the aim of enhancing Web Applications with Dynamically data access was the starting point for Geospatial Web Applications to developed at the same time as well. By the means of these technologies the Web Applications embed the capability of presenting Geographical representations of the Geo Information. The induction in nowadays, of the state of the art technologies known as Web Services, enforce the Web Applications to have interoperability among them i.e. to be able to process requests from each other via a network. In particular throughout the Oceanographic Community, modern Geographical Information systems based on Geospatial Web Services are now developed or will be developed shortly in the near future, with capabilities of managing the information itself fully through Web Based Geographical Interfaces. The exploitation of HNODC Data Base, through a Web Based Application enhanced with Web Services by the use of open source tolls may be consider as an ideal case of such implementation. Hellenic National Oceanographic Data Center (HNODC) as a National Public Oceanographic Data provider and at the same time a member of the International Net of Oceanographic Data Centers( IOC/IODE), owns a very big volume of Data and Relevant information about the Marine Ecosystem. For the efficient management and exploitation of these Data, a relational Data Base has been constructed with a storage of over 300.000 station data concerning, physical, chemical and biological Oceanographic information. The development of a modern Web Application for the End User worldwide to be able to explore and navigate throughout HNODC data via the use of an interface with the capability of presenting Geographical representations of the Geo Information, is today a fact. The application is constituted with State of the art software components and tools such as: • Geospatial and no Spatial Web Services mechanisms • Geospatial open source tools for the creation of Dynamic Geographical Representations. • Communication protocols (messaging mechanisms) in all Layers such as XML and GML together with SOAP protocol via Apache/Axis. At the same time the application may interact with any other SOA application either in sending or receiving Geospatial Data through Geographical Layers, since it inherits the big advantage of interoperability between Web Services systems. Roughly the Architecture can denoted as follows: • At the back End Open source PostgreSQL DBMS stands as the data storage mechanism with more than one Data Base Schemas cause of the separation of the Geospatial Data and the non Geospatial Data. • UMN Map Server and Geoserver are the mechanisms for: Represent Geospatial Data via Web Map Service (WMS) Querying and Navigating in Geospatial and Meta Data Information via Web Feature Service (WFS) oAnd in the near future Transacting and processing new or existing Geospatial Data via Web Processing Service (WPS) • Map Bender, a geospatial portal site management software for OGC and OWS architectures acts as the integration module between the Geospatial Mechanisms. Mapbender comes with an embedded data model capable to manage interfaces for displaying, navigating and querying OGC compliant web map and feature services (WMS and transactional WFS). • Apache and Tomcat stand again as the Web Service middle Layers • Apache Axis with it's embedded implementation of the SOAP protocol ("Simple Object Access Protocol") acts as the No spatial data Mechanism of Web Services. These modules of the platform are still under development but their implementation will be fulfilled in the near future. • And a new Web user Interface for the end user based on enhanced and customized version of a MapBender GUI, a powerful Web Services client. For HNODC the interoperability of Web Services is the big advantage of the developed platform since it is capable to act in the future as provider and consumer of Web Services in both ways: • Either as data products provider for external SOA platforms. • Or as consumer of data products from external SOA platforms for new applications to be developed or for existing applications to be enhanced. A great paradigm of Data Managenet integration and dissemination via the use of such technologies is the European's Union Research Project Seadatanet, with the main objective to develop a standardized distributed system for managing and disseminating the large and diverse data sets and to enhance the currently existing infrastructures with Web Services Further more and when the technology of Web Processing Service (WPS), will be mature enough and applicable for development, the derived data products will be able to have any kind of GIS functionality for consumers across the network. From this point of view HNODC, joins the global scientific community by providing and consuming application Independent data products.
An initial log analysis of usage patterns on a research networking system.

PubMed

Boland, Mary Regina; Trembowelski, Sylvia; Bakken, Suzanne; Weng, Chunhua

2012-08-01

Usage data for research networking systems (RNSs) are valuable but generally unavailable for understanding scientific professionals' information needs and online collaborator seeking behaviors. This study contributes a method for evaluating RNSs and initial usage knowledge of one RNS obtained from using this method. We designed a log for an institutional RNS, defined categories of users and tasks, and analyzed correlations between usage patterns and user and query types. Our results show that scientific professionals spend more time performing deep Web searching on RNSs than generic Google users and we also show that retrieving scientist profiles is faster on an RNS than on Google (3.5 seconds vs. 34.2 seconds) whereas organization-specific browsing on a RNS takes longer than on Google (117.0 seconds vs. 34.2 seconds). Usage patterns vary by user role, e.g., faculty performed more informational queries than administrators, which implies role-specific user support is needed for RNSs. © 2012 Wiley Periodicals, Inc.

An Initial Log Analysis of Usage Patterns on a Research Networking System

PubMed Central

Boland, Mary Regina; Trembowelski, Sylvia; Bakken, Suzanne; Weng, Chunhua

2012-01-01

Abstract Usage data for research networking systems (RNSs) are valuable but generally unavailable for understanding scientific professionals’ information needs and online collaborator seeking behaviors. This study contributes a method for evaluating RNSs and initial usage knowledge of one RNS obtained from using this method. We designed a log for an institutional RNS, defined categories of users and tasks, and analyzed correlations between usage patterns and user and query types. Our results show that scientific professionals spend more time performing deep Web searching on RNSs than generic Google users and we also show that retrieving scientist profiles is faster on an RNS than on Google (3.5 seconds vs. 34.2 seconds) whereas organization‐specific browsing on a RNS takes longer than on Google (117.0 seconds vs. 34.2 seconds). Usage patterns vary by user role, e.g., faculty performed more informational queries than administrators, which implies role‐specific user support is needed for RNSs. Clin Trans Sci 2012; Volume 5: 340–347 PMID:22883612
Information Communication using Knowledge Engine on Flood Issues

NASA Astrophysics Data System (ADS)

Demir, I.; Krajewski, W. F.

2012-04-01

The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to and visualization of flood inundation maps, real-time flood conditions, flood forecasts both short-term and seasonal, and other flood-related data for communities in Iowa. The system is designed for use by general public, often people with no domain knowledge and poor general science background. To improve effective communication with such audience, we have introduced a new way in IFIS to get information on flood related issues - instead of by navigating within hundreds of features and interfaces of the information system and web-based sources-- by providing dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to distributed sources of real-time stream gauges, and in-house data sources, analysis and visualization tools to answer questions grouped into several categories. Users will be able to provide input based on the query within the categories of rainfall, flood conditions, forecast, inundation maps, flood risk and data sensors. Our goal is the systematization of knowledge on flood related issues, and to provide a single source for definitive answers to factual queries. Long-term goal of this knowledge engine is to make all flood related knowledge easily accessible to everyone, and provide educational geoinformatics tool. The future implementation of the system will be able to accept free-form input and voice recognition capabilities within browser and mobile applications. We intend to deliver increasing capabilities for the system over the coming releases of IFIS. This presentation provides an overview of our Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources.
EST-PAC a web package for EST annotation and protein sequence prediction

PubMed Central

Strahm, Yvan; Powell, David; Lefèvre, Christophe

2006-01-01

With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782
NCBI2RDF: enabling full RDF-based access to NCBI databases.

PubMed

Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

2013-01-01

RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
Query3d: a new method for high-throughput analysis of functional residues in protein structures.

PubMed

Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

2005-12-01

The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface.
Query3d: a new method for high-throughput analysis of functional residues in protein structures

PubMed Central

Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

2005-01-01

Background The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Results Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. Conclusion With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface. PMID:16351754
Experiments on Interfaces To Support Query Expansion.

ERIC Educational Resources Information Center

Beaulieu, M.

1997-01-01

Focuses on the user and human-computer interaction aspects of the research based on the Okapi text retrieval system. Three experiments implementing different approaches to query expansion are described, including the use of graphical user interfaces with different windowing techniques. (Author/LRW)
Interactive web-based mapping: bridging technology and data for health

PubMed Central

2011-01-01

Background The Community Health Information System (CHIS) online mapping system was first launched in 1998. Its overarching goal was to provide researchers, residents and organizations access to health related data reflecting the overall health and well-being of their communities within the Greater Houston area. In September 2009, initial planning and development began for the next generation of CHIS. The overarching goal for the new version remained to make health data easily accessible for a wide variety of research audiences. However, in the new version we specifically sought to make the CHIS truly interactive and give the user more control over data selection and reporting. Results In July 2011, a beta version of the next-generation of the application was launched. This next-generation is also a web based interactive mapping tool comprised of two distinct portals: the Breast Health Portal and Project Safety Net. Both are accessed via a Google mapping interface. Geographic coverage for the portals is currently an 8 county region centered on Harris County, Texas. Data accessed by the application include Census 2000, Census 2010 (underway), cancer incidence from the Texas Cancer Registry (TX Dept. of State Health Services), death data from Texas Vital Statistics, clinic locations for free and low-cost health services, along with service lists, hours of operation, payment options and languages spoken, uninsured and poverty data. Conclusions The system features query on the fly technology, which means the data is not generated until the query is provided to the system. This allows users to interact in real-time with the databases and generate customized reports and maps. To the author's knowledge, the Breast Health Portal and Project Safety Net are the first local-scale interactive online mapping interfaces for public health data which allow users to control the data generated. For example, users may generate breast cancer incidence rates by Census tract, in real time, for women aged 40-64. Conversely, they could then generate the same rates for women aged 35-55. The queries are user controlled. PMID:22195603
Utility of Web search query data in testing theoretical assumptions about mephedrone.

PubMed

Kapitány-Fövény, Máté; Demetrovics, Zsolt

2017-05-01

With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.
SuperTarget goes quantitative: update on drug–target interactions

PubMed Central

Hecker, Nikolai; Ahmed, Jessica; von Eichborn, Joachim; Dunkel, Mathias; Macha, Karel; Eckert, Andreas; Gilson, Michael K.; Bourne, Philip E.; Preissner, Robert

2012-01-01

There are at least two good reasons for the on-going interest in drug–target interactions: first, drug-effects can only be fully understood by considering a complex network of interactions to multiple targets (so-called off-target effects) including metabolic and signaling pathways; second, it is crucial to consider drug-target-pathway relations for the identification of novel targets for drug development. To address this on-going need, we have developed a web-based data warehouse named SuperTarget, which integrates drug-related information associated with medical indications, adverse drug effects, drug metabolism, pathways and Gene Ontology (GO) terms for target proteins. At present, the updated database contains >6000 target proteins, which are annotated with >330 000 relations to 196 000 compounds (including approved drugs); the vast majority of interactions include binding affinities and pointers to the respective literature sources. The user interface provides tools for drug screening and target similarity inclusion. A query interface enables the user to pose complex queries, for example, to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target proteins within a certain affinity range. SuperTarget is available at http://bioinformatics.charite.de/supertarget. PMID:22067455
Visual Exploratory Search of Relationship Graphs on Smartphones

PubMed Central

Ouyang, Jianquan; Zheng, Hao; Kong, Fanbin; Liu, Tianming

2013-01-01

This paper presents a novel framework for Visual Exploratory Search of Relationship Graphs on Smartphones (VESRGS) that is composed of three major components: inference and representation of semantic relationship graphs on the Web via meta-search, visual exploratory search of relationship graphs through both querying and browsing strategies, and human-computer interactions via the multi-touch interface and mobile Internet on smartphones. In comparison with traditional lookup search methodologies, the proposed VESRGS system is characterized with the following perceived advantages. 1) It infers rich semantic relationships between the querying keywords and other related concepts from large-scale meta-search results from Google, Yahoo! and Bing search engines, and represents semantic relationships via graphs; 2) the exploratory search approach empowers users to naturally and effectively explore, adventure and discover knowledge in a rich information world of interlinked relationship graphs in a personalized fashion; 3) it effectively takes the advantages of smartphones’ user-friendly interfaces and ubiquitous Internet connection and portability. Our extensive experimental results have demonstrated that the VESRGS framework can significantly improve the users’ capability of seeking the most relevant relationship information to their own specific needs. We envision that the VESRGS framework can be a starting point for future exploration of novel, effective search strategies in the mobile Internet era. PMID:24223936
Spiders and Camels and Sybase! Oh, My!

NASA Astrophysics Data System (ADS)

Barg, Irene; Ferro, Anthony J.; Stobie, Elizabeth

The Hubble Space Telescope NICMOS Guaranteed Time Observers (GTOs) requested a means of sharing point spread function (PSF) observations. Because of the specifics of the instrument, these PSFs are very useful in the analysis of observations and can vary with the conditions on the telescope. The GTOs are geographically diverse, so a centralized processing solution would not work. The individual PSF observations were reduced by different people, at different institutions, using different reduction software. These varied observations had to be combined into a single database and linked to other information as well. The NICMOS software group at the University of Arizona developed a solution based on a World Wide Web (WWW) interface, using Perl/CGI forms to query the submitter about the PSF data to be entered. After some semi-automated sanity checks, using the FTOOLS package, the metadata are then entered into a Sybase relational database system. A user of the system can then query the database, again through a WWW interface, to locate and retrieve PSFs which may match their observations, as well as determine other information regarding the telescope conditions at the time of the observations (e.g., the breathing parameter). This presentation discusses some of the driving forces in the design, problems encountered, and the choices made. The tools used, including Sybase, Perl, FTOOLS, and WWW elements are also discussed.
Intelligent web image retrieval system

NASA Astrophysics Data System (ADS)

Hong, Sungyong; Lee, Chungwoo; Nah, Yunmook

2001-07-01

Recently, the web sites such as e-business sites and shopping mall sites deal with lots of image information. To find a specific image from these image sources, we usually use web search engines or image database engines which rely on keyword only retrievals or color based retrievals with limited search capabilities. This paper presents an intelligent web image retrieval system. We propose the system architecture, the texture and color based image classification and indexing techniques, and representation schemes of user usage patterns. The query can be given by providing keywords, by selecting one or more sample texture patterns, by assigning color values within positional color blocks, or by combining some or all of these factors. The system keeps track of user's preferences by generating user query logs and automatically add more search information to subsequent user queries. To show the usefulness of the proposed system, some experimental results showing recall and precision are also explained.
Automation and integration of components for generalized semantic markup of electronic medical texts.

PubMed

Dugan, J M; Berrios, D C; Liu, X; Kim, D K; Kaizer, H; Fagan, L M

1999-01-01

Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.
Using JavaScript and the FDSN web service to create an interactive earthquake information system

NASA Astrophysics Data System (ADS)

Fischer, Kasper D.

2015-04-01

The FDSN web service provides a web interface to access earthquake meta-data (e. g. event or station information) and waveform date over the internet. Requests are send to a server as URLs and the output is either XML or miniSEED. This makes it hard to read by humans but easy to process with different software. Different data centers are already supporting the FDSN web service, e. g. USGS, IRIS, ORFEUS. The FDSN web service is also part of the Seiscomp3 (http://www.seiscomp3.org) software. The Seismological Observatory of the Ruhr-University switched to Seiscomp3 as the standard software for the analysis of mining induced earthquakes at the beginning of 2014. This made it necessary to create a new web-based earthquake information service for the publication of results to the general public. This has be done by processing the output of a FDSN web service query by javascript running in a standard browser. The result is an interactive map presenting the observed events and further information of events and stations on a single web page as a table and on a map. In addition the user can download event information, waveform data and station data in different formats like miniSEED, quakeML or FDSNxml. The developed code and all used libraries are open source and freely available.
A semantic proteomics dashboard (SemPoD) for data management in translational research.

PubMed

Jayapandian, Catherine P; Zhao, Meng; Ewing, Rob M; Zhang, Guo-Qiang; Sahoo, Satya S

2012-01-01

One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research. The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system. SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.
Earth-Base: A Free And Open Source, RESTful Earth Sciences Platform

NASA Astrophysics Data System (ADS)

Kishor, P.; Heim, N. A.; Peters, S. E.; McClennen, M.

2012-12-01

This presentation describes the motivation, concept, and architecture behind Earth-Base, a web-based, RESTful data-management, analysis and visualization platform for earth sciences data. Traditionally web applications have been built directly accessing data from a database using a scripting language. While such applications are great at bring results to a wide audience, they are limited in scope to the imagination and capabilities of the application developer. Earth-Base decouples the data store from the web application by introducing an intermediate "data application" tier. The data application's job is to query the data store using self-documented, RESTful URIs, and send the results back formatted as JavaScript Object Notation (JSON). Decoupling the data store from the application allows virtually limitless flexibility in developing applications, both web-based for human consumption or programmatic for machine consumption. It also allows outside developers to use the data in their own applications, potentially creating applications that the original data creator and app developer may not have even thought of. Standardized specifications for URI-based querying and JSON-formatted results make querying and developing applications easy. URI-based querying also allows utilizing distributed datasets easily. Companion mechanisms for querying data snapshots aka time-travel, usage tracking and license management, and verification of semantic equivalence of data are also described. The latter promotes the "What You Expect Is What You Get" (WYEIWYG) principle that can aid in data citation and verification.
Easily configured real-time CPOE Pick Off Tool supporting focused clinical research and quality improvement.

PubMed

Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A

2014-01-01

Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.
Gene Expression Omnibus (GEO): Microarray data storage, submission, retrieval, and analysis

PubMed Central

Barrett, Tanya

2006-01-01

The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely distributes high-throughput molecular abundance data, predominantly gene expression data generated by DNA microarray technology. The database has a flexible design that can handle diverse styles of both unprocessed and processed data in a MIAME- (Minimum Information About a Microarray Experiment) supportive infrastructure that promotes fully annotated submissions. GEO currently stores about a billion individual gene expression measurements, derived from over 100 organisms, submitted by over 1,500 laboratories, addressing a wide range of biological phenomena. To maximize the utility of these data, several user-friendly Web-based interfaces and applications have been implemented that enable effective exploration, query, and visualization of these data, at the level of individual genes or entire studies. This chapter describes how the data are stored, submission procedures, and mechanisms for data retrieval and query. GEO is publicly accessible at http://www.ncbi.nlm.nih.gov/projects/geo/. PMID:16939800
The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Cranshaw, J.; Favareto, A.; Fernández Casaní, Á.; Gallas, E. J.; Glasman, C.; González de la Hoz, S.; Hřivnáč, J.; Malon, D.; Prokoshin, F.; Salt Cairols, J.; Sánchez, J.; Többicke, R.; Yuan, R.

2015-12-01

The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LSI. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported.

Using PeptideAtlas, SRMAtlas and PASSEL – Comprehensive Resources for discovery and targeted proteomics

PubMed Central

Kusebauch, Ulrike; Deutsch, Eric W.; Campbell, David S.; Sun, Zhi; Farrah, Terry; Moritz, Robert L.

2014-01-01

PeptideAtlas, SRMAtlas and PASSEL are web-accessible resources to support discovery and targeted proteomics research. PeptideAtlas is a multi-species compendium of shotgun proteomic data provided by the scientific community, SRMAtlas is a resource of high-quality, complete proteome SRM assays generated in a consistent manner for the targeted identification and quantification of proteins, and PASSEL is a repository that compiles and represents selected reaction monitoring data, all in an easy to use interface. The databases are generated from native mass spectrometry data files that are analyzed in a standardized manner including statistical validation of the results. Each resource offers search functionalities and can be queried by user defined constraints; the query results are provided in tables or are graphically displayed. PeptideAtlas, SRMAtlas and PASSEL are publicly available freely via the website http://www.peptideatlas.org. In this protocol, we describe the use of these resources, we highlight how to submit, search, collate and download data. PMID:24939129
Hierarchical data security in a Query-By-Example interface for a shared database.

PubMed

Taylor, Merwyn

2002-06-01

Whenever a shared database resource, containing critical patient data, is created, protecting the contents of the database is a high priority goal. This goal can be achieved by developing a Query-By-Example (QBE) interface, designed to access a shared database, and embedding within the QBE a hierarchical security module that limits access to the data. The security module ensures that researchers working in one clinic do not get access to data from another clinic. The security can be based on a flexible taxonomy structure that allows ordinary users to access data from individual clinics and super users to access data from all clinics. All researchers submit queries through the same interface and the security module processes the taxonomy and user identifiers to limit access. Using this system, two different users with different access rights can submit the same query and get different results thus reducing the need to create different interfaces for different clinics and access rights.
A Magnetic Petrology Database for Satellite Magnetic Anomaly Interpretations

NASA Astrophysics Data System (ADS)

Nazarova, K.; Wasilewski, P.; Didenko, A.; Genshaft, Y.; Pashkevich, I.

2002-05-01

A Magnetic Petrology Database (MPDB) is now being compiled at NASA/Goddard Space Flight Center in cooperation with Russian and Ukrainian Institutions. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via Internet for more realistic interpretation of satellite magnetic anomalies. Magnetic Petrology Data had been accumulated in NASA/Goddard Space Flight Center, United Institute of Physics of the Earth (Russia) and Institute of Geophysics (Ukraine) over several decades and now consists of many thousands of records of data in our archives. The MPDB was, and continues to be in big demand especially since recent launching in near Earth orbit of the mini-constellation of three satellites - Oersted (in 1999), Champ (in 2000), and SAC-C (in 2000) which will provide lithospheric magnetic maps with better spatial and amplitude resolution (about 1 nT). The MPDB is focused on lower crustal and upper mantle rocks and will include data on mantle xenoliths, serpentinized ultramafic rocks, granulites, iron quartzites and rocks from Archean-Proterozoic metamorphic sequences from all around the world. A substantial amount of data is coming from the area of unique Kursk Magnetic Anomaly and Kola Deep Borehole (which recovered 12 km of continental crust). A prototype MPDB can be found on the Geodynamics Branch web server of Goddard Space Flight Center at http://core2.gsfc.nasa.gov/terr_mag/magnpetr.html. The MPDB employs a searchable relational design and consists of 7 interrelated tables. The schema of database is shown at http://core2.gsfc.nasa.gov/terr_mag/doc.html. MySQL database server was utilized to implement MPDB. The SQL (Structured Query Language) is used to query the database. To present the results of queries on WEB and for WEB programming we utilized PHP scripting language and CGI scripts. The prototype MPDB is designed to search database by major satellite magnetic anomaly, tectonic structure, geographical location, rock type, magnetic properties, chemistry and reference, see http://core2.gsfc.nasa.gov/terr_mag/query1.html. The output of database is HTML structured table, text file, and downloadable file. This database will be very useful for studies of lithospheric satellite magnetic anomalies on the Earth and other terrestrial planets.
A Review of Statistical Disclosure Control Techniques Employed by Web-Based Data Query Systems.

PubMed

Matthews, Gregory J; Harel, Ofer; Aseltine, Robert H

We systematically reviewed the statistical disclosure control techniques employed for releasing aggregate data in Web-based data query systems listed in the National Association for Public Health Statistics and Information Systems (NAPHSIS). Each Web-based data query system was examined to see whether (1) it employed any type of cell suppression, (2) it used secondary cell suppression, and (3) suppressed cell counts could be calculated. No more than 30 minutes was spent on each system. Of the 35 systems reviewed, no suppression was observed in more than half (n = 18); observed counts below the threshold were observed in 2 sites; and suppressed values were recoverable in 9 sites. Six sites effectively suppressed small counts. This inquiry has revealed substantial weaknesses in the protective measures used in data query systems containing sensitive public health data. Many systems utilized no disclosure control whatsoever, and the vast majority of those that did deployed it inconsistently or inadequately.
WPBMB Entrez: An interface to NCBI Entrez for Wordpress.

PubMed

Gohara, David W

2018-03-01

Research-oriented websites are an important means for the timely communication of information. These websites fall under a number of categories including: research laboratories, training grant and program projects, and online service portals. Invariably there is content on a site, such as publication listings, that require frequent updating. A number of content management systems exist to aid in the task of developing and managing a website, each with their strengths and weaknesses. One popular choice is Wordpress, a free, open source and actively developed application for the creation of web content. During a recent site redesign for our department, the need arose to ensure publications were up to date for each of the research labs and department as a whole. Several plugins for Wordpress offer this type of functionality, but in many cases the plugins are either no longer maintained, are missing features that would require the use of several, possibly incompatible, plugins or lack features for layout on a webpage. WPBMB Entrez was developed to address these needs. WPBMB Entrez utilizes a subset of NCBI Entrez and RCSB databases to maintain up to date records of publications, and publication related information on Wordpress-based websites. The core functionality uses the same search query syntax as on the NCBI Entrez site, including advanced query syntaxes. The plugin is extensible allowing for rapid development and addition of new data sources as the need arises. WPBMB Entrez was designed to be easy to use, yet flexible enough to address more complex usage scenarios. Features of the plugin include: an easy to use interface, design customization, multiple templates for displaying publication results, a caching mechanism to reduce page load times, supports multiple distinct queries and retrieval modes, and the ability to aggregate multiple queries into unified lists. Additionally, developer documentation is provided to aid in customization of the plugin. WPBMB Entrez is available at no cost, is open source and works with all recent versions of Wordpress. Copyright © 2017 Elsevier B.V. All rights reserved.
PmiRExAt: plant miRNA expression atlas database and web applications

PubMed Central

Gurjar, Anoop Kishor Singh; Panwar, Abhijeet Singh; Gupta, Rajinder; Mantri, Shrikant S.

2016-01-01

High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and nutritional value. Bioinformatics resources enabling meta-analysis of miRNA expression across multiple plant species are still evolving. Here, we report PmiRExAt, a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs. A feature enabling expression study of conserved miRNA across multiple species is also implemented. Custom expression analysis feature enables expression analysis of novel miRNA in total 117 datasets. New sRNA dataset can also be uploaded for analysing miRNA expression profiles for 73 plant species. PmiRExAt application program interface, a simple object access protocol web service allows other programmers to remotely invoke the methods written for doing programmatic search operations on PmiRExAt database. Database URL: http://pmirexat.nabi.res.in. PMID:27081157
Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.

PubMed

Xu, Qinneng; Gel, Yulia R; Ramirez Ramirez, L Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

2017-01-01

The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable and computationally efficient.
The Xeno-glycomics database (XDB): a relational database of qualitative and quantitative pig glycome repertoire.

PubMed

Park, Hae-Min; Park, Ju-Hyeong; Kim, Yoon-Woo; Kim, Kyoung-Jin; Jeong, Hee-Jin; Jang, Kyoung-Soon; Kim, Byung-Gee; Kim, Yun-Gon

2013-11-15

In recent years, the improvement of mass spectrometry-based glycomics techniques (i.e. highly sensitive, quantitative and high-throughput analytical tools) has enabled us to obtain a large dataset of glycans. Here we present a database named Xeno-glycomics database (XDB) that contains cell- or tissue-specific pig glycomes analyzed with mass spectrometry-based techniques, including a comprehensive pig glycan information on chemical structures, mass values, types and relative quantities. It was designed as a user-friendly web-based interface that allows users to query the database according to pig tissue/cell types or glycan masses. This database will contribute in providing qualitative and quantitative information on glycomes characterized from various pig cells/organs in xenotransplantation and might eventually provide new targets in the α1,3-galactosyltransferase gene-knock out pigs era. The database can be accessed on the web at http://bioinformatics.snu.ac.kr/xdb.
SLIDE - a web-based tool for interactive visualization of large-scale -omics data.

PubMed

Ghosh, Soumita; Datta, Abhik; Tan, Kaisen; Choi, Hyungwon

2018-06-28

Data visualization is often regarded as a post hoc step for verifying statistically significant results in the analysis of high-throughput data sets. This common practice leaves a large amount of raw data behind, from which more information can be extracted. However, existing solutions do not provide capabilities to explore large-scale raw datasets using biologically sensible queries, nor do they allow user interaction based real-time customization of graphics. To address these drawbacks, we have designed an open-source, web-based tool called Systems-Level Interactive Data Exploration, or SLIDE to visualize large-scale -omics data interactively. SLIDE's interface makes it easier for scientists to explore quantitative expression data in multiple resolutions in a single screen. SLIDE is publicly available under BSD license both as an online version as well as a stand-alone version at https://github.com/soumitag/SLIDE. Supplementary Information are available at Bioinformatics online.
NCBI GEO: mining tens of millions of expression profiles--database and tools update.

PubMed

Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Edgar, Ron

2007-01-01

The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at http://www.ncbi.nlm.nih.gov/geo/
Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

PubMed

Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

2014-01-01

With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.
Hosting and pulishing astronomical data in SQL databases

NASA Astrophysics Data System (ADS)

Galkin, Anastasia; Klar, Jochen; Riebe, Kristin; Matokevic, Gal; Enke, Harry

2017-04-01

In astronomy, terabytes and petabytes of data are produced by ground instruments, satellite missions and simulations. At Leibniz-Institute for Astrophysics Potsdam (AIP) we host and publish terabytes of cosmological simulation and observational data. The public archive at AIP has now reached a size of 60TB and growing and helps to produce numerous scientific papers. The web framework Daiquiri offers a dedicated web interface for each of the hosted scientific databases. Scientists all around the world run SQL queries which include specific astrophysical functions and get their desired data in reasonable time. Daiquiri supports the scientific projects by offering a number of administration tools such as database and user management, contact messages to the staff and support for organization of meetings and workshops. The webpages can be customized and the Wordpress integration supports the participating scientists in maintaining the documentation and the projects' news sections.
Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths

NASA Astrophysics Data System (ADS)

Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka

2010-11-01

Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.
QBCov: A Linked Data interface for Discrete Global Grid Systems, a new approach to delivering coverage data on the web

NASA Astrophysics Data System (ADS)

Zhang, Z.; Toyer, S.; Brizhinev, D.; Ledger, M.; Taylor, K.; Purss, M. B. J.

2016-12-01

We are witnessing a rapid proliferation of geoscientific and geospatial data from an increasing variety of sensors and sensor networks. This data presents great opportunities to resolve cross-disciplinary problems. However, working with it often requires an understanding of file formats and protocols seldom used outside of scientific computing, potentially limiting the data's value to other disciplines. In this paper, we present a new approach to serving satellite coverage data on the web, which improves ease-of-access using the principles of linked data. Linked data adapts the concepts and protocols of the human-readable web to machine-readable data; the number of developers familiar with web technologies makes linked data a natural choice for bringing coverages to a wider audience. Our approach to using linked data also makes it possible to efficiently service high-level SPARQL queries: for example, "Retrieve all Landsat ETM+ observations of San Francisco between July and August 2016" can easily be encoded in a single query. We validate the new approach, which we call QBCov, with a reference implementation of the entire stack, including a simple web-based client for interacting with Landsat observations. In addition to demonstrating the utility of linked data for publishing coverages, we investigate the heretofore unexplored relationship between Discrete Global Grid Systems (DGGS) and linked data. Our conclusions are informed by the aforementioned reference implementation of QBCov, which is backed by a hierarchical file format designed around the rHEALPix DGGS. Not only does the choice of a DGGS-based representation provide an efficient mechanism for accessing large coverages at multiple scales, but the ability of DGGS to produce persistent, unique identifiers for spatial regions is especially valuable in a linked data context. This suggests that DGGS has an important role to play in creating sustainable and scalable linked data infrastructures. QBCov is being developed as a contribution to the Spatial Data on the Web working group--a joint activity of the Open Geospatial Consortium and World Wide Web Consortium.
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

PubMed Central

Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

2013-01-01

RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
BOSS: context-enhanced search for biomedical objects

PubMed Central

2012-01-01

Background There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. Methods Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. Results The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. Conclusion BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information. PMID:22595092
ChemiRs: a web application for microRNAs and chemicals.

PubMed

Su, Emily Chia-Yu; Chen, Yu-Sing; Tien, Yun-Cheng; Liu, Jeff; Ho, Bing-Ching; Yu, Sung-Liang; Singh, Sher

2016-04-18

MicroRNAs (miRNAs) are about 22 nucleotides, non-coding RNAs that affect various cellular functions, and play a regulatory role in different organisms including human. Until now, more than 2500 mature miRNAs in human have been discovered and registered, but still lack of information or algorithms to reveal the relations among miRNAs, environmental chemicals and human health. Chemicals in environment affect our health and daily life, and some of them can lead to diseases by inferring biological pathways. We develop a creditable online web server, ChemiRs, for predicting interactions and relations among miRNAs, chemicals and pathways. The database not only compares gene lists affected by chemicals and miRNAs, but also incorporates curated pathways to identify possible interactions. Here, we manually retrieved associations of miRNAs and chemicals from biomedical literature. We developed an online system, ChemiRs, which contains miRNAs, diseases, Medical Subject Heading (MeSH) terms, chemicals, genes, pathways and PubMed IDs. We connected each miRNA to miRBase, and every current gene symbol to HUGO Gene Nomenclature Committee (HGNC) for genome annotation. Human pathway information is also provided from KEGG and REACTOME databases. Information about Gene Ontology (GO) is queried from GO Online SQL Environment (GOOSE). With a user-friendly interface, the web application is easy to use. Multiple query results can be easily integrated and exported as report documents in PDF format. Association analysis of miRNAs and chemicals can help us understand the pathogenesis of chemical components. ChemiRs is freely available for public use at http://omics.biol.ntnu.edu.tw/ChemiRs .
R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.

PubMed

Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles

2015-07-01

The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Toward a workbench for rodent brain image data: systems architecture and design.

PubMed

Moene, Ivar A; Subramaniam, Shankar; Darin, Dmitri; Leergaard, Trygve B; Bjaalie, Jan G

2007-01-01

We present a novel system for storing and manipulating microscopic images from sections through the brain and higher-level data extracted from such images. The system is designed and built on a three-tier paradigm and provides the research community with a web-based interface for facile use in neuroscience research. The Oracle relational database management system provides the ability to store a variety of objects relevant to the images and provides the framework for complex querying of data stored in the system. Further, the suite of applications intimately tied into the infrastructure in the application layer provide the user the ability not only to query and visualize the data, but also to perform analysis operations based on the tools embedded into the system. The presentation layer uses extant protocols of the modern web browser and this provides ease of use of the system. The present release, named Functional Anatomy of the Cerebro-Cerebellar System (FACCS), available through The Rodent Brain Workbench (http:// rbwb.org/), is targeted at the functional anatomy of the cerebro-cerebellar system in rats, and holds axonal tracing data from these projections. The system is extensible to other circuits and projections and to other categories of image data and provides a unique environment for analysis of rodent brain maps in the context of anatomical data. The FACCS application assumes standard animal brain atlas models and can be extended to future models. The system is available both for interactive use from a remote web-browser client as well as for download to a local server machine.
Development of a data mining and imaging informatics display tool for a multiple sclerosis e-folder system

NASA Astrophysics Data System (ADS)

Liu, Margaret; Loo, Jerry; Ma, Kevin; Liu, Brent

2011-03-01

Multiple sclerosis (MS) is a debilitating autoimmune disease of the central nervous system that damages axonal pathways through inflammation and demyelination. In order to address the need for a centralized application to manage and study MS patients, the MS e-Folder - a web-based, disease-specific electronic medical record system - was developed. The e-Folder has a PHP and MySQL based graphical user interface (GUI) that can serve as both a tool for clinician decision support and a data mining tool for researchers. This web-based GUI gives the e-Folder a user friendly interface that can be securely accessed through the internet and requires minimal software installation on the client side. The e-Folder GUI displays and queries patient medical records--including demographic data, social history, past medical history, and past MS history. In addition, DICOM format imaging data, and computer aided detection (CAD) results from a lesion load algorithm are also displayed. The GUI interface is dynamic and allows manipulation of the DICOM images, such as zoom, pan, and scrolling, and the ability to rotate 3D images. Given the complexity of clinical management and the need to bolster research in MS, the MS e-Folder system will improve patient care and provide MS researchers with a function-rich patient data hub.

Data Processing on Database Management Systems with Fuzzy Query

NASA Astrophysics Data System (ADS)

Şimşek, Irfan; Topuz, Vedat

In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.
End-User Use of Data Base Query Language: Pros and Cons.

ERIC Educational Resources Information Center

Nicholes, Walter

1988-01-01

Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

PubMed

Yang, Jian-Hua; Qu, Liang-Hu

2012-01-01

Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
An ontology-driven tool for structured data acquisition using Web forms.

PubMed

Gonçalves, Rafael S; Tu, Samson W; Nyulas, Csongor I; Tierney, Michael J; Musen, Mark A

2017-08-01

Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining eligibility for disability benefits) based on conclusions derived from acquired data (e.g., assessment of impaired motor function). To use data in these settings, we need it structured in a way that can be exploited by automated reasoning systems, for instance, in the Web Ontology Language (OWL); the de facto ontology language for the Web. We tackle the problem of generating Web-based assessment forms from OWL ontologies, and aggregating input gathered through these forms as an ontology of "semantically-enriched" form data that can be queried using an RDF query language, such as SPARQL. We developed an ontology-based structured data acquisition system, which we present through its specific application to the clinical functional assessment domain. We found that data gathered through our system is highly amenable to automatic analysis using queries. We demonstrated how ontologies can be used to help structuring Web-based forms and to semantically enrich the data elements of the acquired structured data. The ontologies associated with the enriched data elements enable automated inferences and provide a rich vocabulary for performing queries.
Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

PubMed Central

Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben

2012-01-01

Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286
Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences

PubMed Central

Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi

2006-01-01

Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384
Artemis: Integrating Scientific Data on the Grid (Preprint)

DTIC Science & Technology

2004-07-01

Theseus execution engine [Barish and Knoblock 03] to efficiently execute the generated datalog program. The Theseus execution engine has a wide...variety of operations to query databases, web sources, and web services. Theseus also contains a wide variety of relational operations, such as...selection, union, or projection. Furthermore, Theseus optimizes the execution of an integration plan by querying several data sources in parallel and
Index Compression and Efficient Query Processing in Large Web Search Engines

ERIC Educational Resources Information Center

Ding, Shuai

2013-01-01

The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…
GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

PubMed

d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

2009-06-01

We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.
IRIS Earthquake Browser with Integration to the GEON IDV for 3-D Visualization of Hypocenters.

NASA Astrophysics Data System (ADS)

Weertman, B. R.

2007-12-01

We present a new generation of web based earthquake query tool - the IRIS Earthquake Browser (IEB). The IEB combines the DMC's large set of earthquake catalogs (provided by USGS/NEIC, ISC and the ANF) with the popular Google Maps web interface. With the IEB you can quickly and easily find earthquakes in any region of the globe. Using Google's detailed satellite images, earthquakes can be easily co-located with natural geographic features such as volcanoes as well as man made features such as commercial mines. A set of controls allow earthquakes to be filtered by time, magnitude, and depth range as well as catalog name, contributor name and magnitude type. Displayed events can be easily exported in NetCDF format into the GEON Integrated Data Viewer (IDV) where hypocenters may be visualized in three dimensions. Looking "under the hood", the IEB is based on AJAX technology and utilizes REST style web services hosted at the IRIS DMC. The IEB is part of a broader effort at the DMC aimed at making our data holdings available via web services. The IEB is useful both educationally and as a research tool.
Introducing glycomics data into the Semantic Web

PubMed Central

2013-01-01

Background Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as “switches” that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as “proofs-of-concept” to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains. PMID:24280648
Introducing glycomics data into the Semantic Web.

PubMed

Aoki-Kinoshita, Kiyoko F; Bolleman, Jerven; Campbell, Matthew P; Kawano, Shin; Kim, Jin-Dong; Lütteke, Thomas; Matsubara, Masaaki; Okuda, Shujiro; Ranzinger, Rene; Sawaki, Hiromichi; Shikanai, Toshihide; Shinmachi, Daisuke; Suzuki, Yoshinori; Toukach, Philip; Yamada, Issaku; Packer, Nicolle H; Narimatsu, Hisashi

2013-11-26

Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.

PubMed

Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie

2017-03-01

Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].
Designing Crop Simulation Web Service with Service Oriented Architecture Principle

NASA Astrophysics Data System (ADS)

Chinnachodteeranun, R.; Hung, N. D.; Honda, K.

2015-12-01

Crop simulation models are efficient tools for simulating crop growth processes and yield. Running crop models requires data from various sources as well as time-consuming data processing, such as data quality checking and data formatting, before those data can be inputted to the model. It makes the use of crop modeling limited only to crop modelers. We aim to make running crop models convenient for various users so that the utilization of crop models will be expanded, which will directly improve agricultural applications. As the first step, we had developed a prototype that runs DSSAT on Web called as Tomorrow's Rice (v. 1). It predicts rice yields based on a planting date, rice's variety and soil characteristics using DSSAT crop model. A user only needs to select a planting location on the Web GUI then the system queried historical weather data from available sources and expected yield is returned. Currently, we are working on weather data connection via Sensor Observation Service (SOS) interface defined by Open Geospatial Consortium (OGC). Weather data can be automatically connected to a weather generator for generating weather scenarios for running the crop model. In order to expand these services further, we are designing a web service framework consisting of layers of web services to support compositions and executions for running crop simulations. This framework allows a third party application to call and cascade each service as it needs for data preparation and running DSSAT model using a dynamic web service mechanism. The framework has a module to manage data format conversion, which means users do not need to spend their time curating the data inputs. Dynamic linking of data sources and services are implemented using the Service Component Architecture (SCA). This agriculture web service platform demonstrates interoperability of weather data using SOS interface, convenient connections between weather data sources and weather generator, and connecting various services for running crop models for decision support.
New NED XML/VOtable Services and Client Interface Applications

NASA Astrophysics Data System (ADS)

Pevunova, O.; Good, J.; Mazzarella, J.; Berriman, G. B.; Madore, B.

2005-12-01

The NASA/IPAC Extragalactic Database (NED) provides data and cross-identifications for over 7 million extragalactic objects fused from thousands of survey catalogs and journal articles. The data cover all frequencies from radio through gamma rays and include positions, redshifts, photometry and spectral energy distributions (SEDs), sizes, and images. NED services have traditionally supplied data in HTML format for connections from Web browsers, and a custom ASCII data structure for connections by remote computer programs written in the C programming language. We describe new services that provide responses from NED queries in XML documents compliant with the international virtual observatory VOtable protocol. The XML/VOtable services support cone searches, all-sky searches based on object attributes (survey names, cross-IDs, redshifts, flux densities), and requests for detailed object data. Initial services have been inserted into the NVO registry, and others will follow soon. The first client application is a Style Sheet specification for rendering NED VOtable query results in Web browsers that support XML. The second prototype application is a Java applet that allows users to compare multiple SEDs. The new XML/VOtable output mode will also simplify the integration of data from NED into visualization and analysis packages, software agents, and other virtual observatory applications. We show an example SED from NED plotted using VOPlot. The NED website is: http://nedwww.ipac.caltech.edu.
miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling.

PubMed

Plaisier, Christopher L; Bare, J Christopher; Baliga, Nitin S

2011-07-01

Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3' un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3'-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3'-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net.
AgdbNet – antigen sequence database software for bacterial typing

PubMed Central

Jolley, Keith A; Maiden, Martin CJ

2006-01-01

Background Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences. Results Databases are described by XML files that are parsed by a Perl CGI script. Each database can have any number of loci, which may be defined by nucleotide and/or peptide sequences. The software is currently in use on at least five public databases for the typing of Neisseria meningitidis, Campylobacter jejuni and Streptococcus equi and can be set up to query internal isolate tables or suitably-configured external isolate databases, such as those used for multilocus sequence typing. The style of the resulting website can be fully configured by modifying stylesheets and through the use of customised header and footer files that surround the output of the script. Conclusion The software provides a rapid means of setting up customised Internet antigen sequence databases. The flexible configuration options enable typing schemes with differing requirements to be accommodated. PMID:16790057
Ontology-oriented retrieval of putative microRNAs in Vitis vinifera via GrapeMiRNA: a web database of de novo predicted grape microRNAs.

PubMed

Lazzari, Barbara; Caprera, Andrea; Cestaro, Alessandro; Merelli, Ivan; Del Corvo, Marcello; Fontana, Paolo; Milanesi, Luciano; Velasco, Riccardo; Stella, Alessandra

2009-06-29

Two complete genome sequences are available for Vitis vinifera Pinot noir. Based on the sequence and gene predictions produced by the IASMA, we performed an in silico detection of putative microRNA genes and of their targets, and collected the most reliable microRNA predictions in a web database. The application is available at http://www.itb.cnr.it/ptp/grapemirna/. The program FindMiRNA was used to detect putative microRNA genes in the grape genome. A very high number of predictions was retrieved, calling for validation. Nine parameters were calculated and, based on the grape microRNAs dataset available at miRBase, thresholds were defined and applied to FindMiRNA predictions having targets in gene exons. In the resulting subset, predictions were ranked according to precursor positions and sequence similarity, and to target identity. To further validate FindMiRNA predictions, comparisons to the Arabidopsis genome, to the grape Genoscope genome, and to the grape EST collection were performed. Results were stored in a MySQL database and a web interface was prepared to query the database and retrieve predictions of interest. The GrapeMiRNA database encompasses 5,778 microRNA predictions spanning the whole grape genome. Predictions are integrated with information that can be of use in selection procedures. Tools added in the web interface also allow to inspect predictions according to gene ontology classes and metabolic pathways of targets. The GrapeMiRNA database can be of help in selecting candidate microRNA genes to be validated.
Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource

PubMed Central

Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E.; Ellisman, Mark; Grethe, Jeffrey; Wooley, John

2011-01-01

The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data. PMID:21045053
Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource.

PubMed

Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E; Ellisman, Mark; Grethe, Jeffrey; Wooley, John

2011-01-01

The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.

EnsMart: A Generic System for Fast and Flexible Access to Biological Data

PubMed Central

Kasprzyk, Arek; Keefe, Damian; Smedley, Damian; London, Darin; Spooner, William; Melsopp, Craig; Hammond, Martin; Rocca-Serra, Philippe; Cox, Tony; Birney, Ewan

2004-01-01

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to `non-Ensembl' data sets. PMID:14707178
ITSoneDB: a comprehensive collection of eukaryotic ribosomal RNA Internal Transcribed Spacer 1 (ITS1) sequences

PubMed Central

Santamaria, Monica; Fosso, Bruno; Licciulli, Flavio; Balech, Bachir; Larini, Ilaria; Grillo, Giorgio; De Caro, Giorgio; Liuni, Sabino

2018-01-01

Abstract A holistic understanding of environmental communities is the new challenge of metagenomics. Accordingly, the amplicon-based or metabarcoding approach, largely applied to investigate bacterial microbiomes, is moving to the eukaryotic world too. Indeed, the analysis of metabarcoding data may provide a comprehensive assessment of both bacterial and eukaryotic composition in a variety of environments, including human body. In this respect, whereas hypervariable regions of the 16S rRNA are the de facto standard barcode for bacteria, the Internal Transcribed Spacer 1 (ITS1) of ribosomal RNA gene cluster has shown a high potential in discriminating eukaryotes at deep taxonomic levels. As metabarcoding data analysis rely on the availability of a well-curated barcode reference resource, a comprehensive collection of ITS1 sequences supplied with robust taxonomies, is highly needed. To address this issue, we created ITSoneDB (available at http://itsonedb.cloud.ba.infn.it/) which in its current version hosts 985 240 ITS1 sequences spanning over 134 000 eukaryotic species. Each ITS1 is mapped on the NCBI reference taxonomy with its start and end positions precisely annotated. ITSoneDB has been developed in agreement to the FAIR guidelines by enabling the users to query and download its content through a simple web-interface and access relevant metadata by cross-linking to European Nucleotide Archive. PMID:29036529
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

NASA Astrophysics Data System (ADS)

Farhan Husain, Mohammad; Doshi, Pankil; Khan, Latifur; Thuraisingham, Bhavani

Handling huge amount of data scalably is a matter of concern for a long time. Same is true for semantic web data. Current semantic web frameworks lack this ability. In this paper, we describe a framework that we built using Hadoop to store and retrieve large number of RDF triples. We describe our schema to store RDF data in Hadoop Distribute File System. We also present our algorithms to answer a SPARQL query. We make use of Hadoop's MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built mostly by cheap commodity class hardware and still can answer queries fast enough. We conclude that ours is a scalable framework, able to handle large amount of RDF data efficiently.
A review of CDC's Web-based Injury Statistics Query and Reporting System (WISQARS™): Planning for the future of injury surveillance✩

PubMed Central

Ballesteros, Michael F.; Webb, Kevin; McClure, Roderick J.

2017-01-01

Introduction The Centers for Disease Control and Prevention (CDC) developed the Web-based Injury Statistics Query and Reporting System (WISQARSTM) to meet the data needs of injury practitioners. In 2015, CDC completed a Portfolio Review of this system to inform its future development. Methods Evaluation questions addressed utilization, technology and innovation, data sources, and tools and training. Data were collected through environmental scans, a review of peer-reviewed and grey literature, a web search, and stakeholder interviews. Results Review findings led to specific recommendations for each evaluation question. Response CDC reviewed each recommendation and initiated several enhancements that will improve the ability of injury prevention practitioners to leverage these data, better make sense of query results, and incorporate findings and key messages into prevention practices. PMID:28454867
Tools for Administration of a UNIX-Based Network

NASA Technical Reports Server (NTRS)

LeClaire, Stephen; Farrar, Edward

2004-01-01

Several computer programs have been developed to enable efficient administration of a large, heterogeneous, UNIX-based computing and communication network that includes a variety of computers connected to a variety of subnetworks. One program provides secure software tools for administrators to create, modify, lock, and delete accounts of specific users. This program also provides tools for users to change their UNIX passwords and log-in shells. These tools check for errors. Another program comprises a client and a server component that, together, provide a secure mechanism to create, modify, and query quota levels on a network file system (NFS) mounted by use of the VERITAS File SystemJ software. The client software resides on an internal secure computer with a secure Web interface; one can gain access to the client software from any authorized computer capable of running web-browser software. The server software resides on a UNIX computer configured with the VERITAS software system. Directories where VERITAS quotas are applied are NFS-mounted. Another program is a Web-based, client/server Internet Protocol (IP) address tool that facilitates maintenance lookup of information about IP addresses for a network of computers.
Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

PubMed Central

Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

2014-01-01

Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537
Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

PubMed

Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

2014-07-04

The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.
Head Lice Surveillance on a Deregulated OTC-Sales Market: A Study Using Web Query Data

PubMed Central

Lindh, Johan; Magnusson, Måns; Grünewald, Maria; Hulth, Anette

2012-01-01

The head louse, Pediculus humanus capitis, is an obligate ectoparasite that causes infestations of humans. Studies have demonstrated a correlation between sales figures for over-the-counter (OTC) treatment products and the number of humans with head lice. The deregulation of the Swedish pharmacy market on July 1, 2009, decreased the possibility to obtain complete sale figures and thereby the possibility to obtain yearly trends of head lice infestations. In the presented study we wanted to investigate whether web queries on head lice can be used as substitute for OTC sales figures. Via Google Insights for Search and Vårdguiden medical web site, the number of queries on “huvudlöss” (head lice) and “hårlöss” (lice in hair) were obtained. The analysis showed that both the Vårdguiden series and the Google series were statistically significant (p<0.001) when added separately, but if the Google series were already included in the model, the Vårdguiden series were not statistically significant (p = 0.5689). In conclusion, web queries can detect if there is an increase or decrease of head lice infested humans in Sweden over a period of years, and be as reliable a proxy as the OTC-sales figures. PMID:23144923
Head lice surveillance on a deregulated OTC-sales market: a study using web query data.

PubMed

Lindh, Johan; Magnusson, Måns; Grünewald, Maria; Hulth, Anette

2012-01-01

The head louse, Pediculus humanus capitis, is an obligate ectoparasite that causes infestations of humans. Studies have demonstrated a correlation between sales figures for over-the-counter (OTC) treatment products and the number of humans with head lice. The deregulation of the Swedish pharmacy market on July 1, 2009, decreased the possibility to obtain complete sale figures and thereby the possibility to obtain yearly trends of head lice infestations. In the presented study we wanted to investigate whether web queries on head lice can be used as substitute for OTC sales figures. Via Google Insights for Search and Vårdguiden medical web site, the number of queries on "huvudlöss" (head lice) and "hårlöss" (lice in hair) were obtained. The analysis showed that both the Vårdguiden series and the Google series were statistically significant (p<0.001) when added separately, but if the Google series were already included in the model, the Vårdguiden series were not statistically significant (p = 0.5689). In conclusion, web queries can detect if there is an increase or decrease of head lice infested humans in Sweden over a period of years, and be as reliable a proxy as the OTC-sales figures.
Computational toxicology using the OpenTox application programming interface and Bioclipse

PubMed Central

2011-01-01

Background Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. Findings This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. Conclusions A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers. PMID:22075173
Voice-enabled Knowledge Engine using Flood Ontology and Natural Language Processing

NASA Astrophysics Data System (ADS)

Sermet, M. Y.; Demir, I.; Krajewski, W. F.

2015-12-01

The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The IFIS is designed for use by general public, often people with no domain knowledge and limited general science background. To improve effective communication with such audience, we have introduced a voice-enabled knowledge engine on flood related issues in IFIS. Instead of navigating within many features and interfaces of the information system and web-based sources, the system provides dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to real-time stream gauges, in-house data sources, analysis and visualization tools to answer natural language questions. Our goal is the systematization of data and modeling results on flood related issues in Iowa, and to provide an interface for definitive answers to factual queries. The goal of the knowledge engine is to make all flood related knowledge in Iowa easily accessible to everyone, and support voice-enabled natural language input. We aim to integrate and curate all flood related data, implement analytical and visualization tools, and make it possible to compute answers from questions. The IFIS explicitly implements analytical methods and models, as algorithms, and curates all flood related data and resources so that all these resources are computable. The IFIS Knowledge Engine computes the answer by deriving it from its computational knowledge base. The knowledge engine processes the statement, access data warehouse, run complex database queries on the server-side and return outputs in various formats. This presentation provides an overview of IFIS Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources. IFIS Knowledge Engine provides an alternative access method to these comprehensive set of tools and data resources available in IFIS. Current implementation of the system accepts free-form input and voice recognition capabilities within browser and mobile applications.
A digital repository with an extensible data model for biobanking and genomic analysis management.

PubMed

Izzo, Massimiliano; Mortola, Francesco; Arnulfo, Gabriele; Fato, Marco M; Varesio, Luigi

2014-01-01

Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
A digital repository with an extensible data model for biobanking and genomic analysis management

PubMed Central

2014-01-01

Motivation Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid. PMID:25077808
An informatics model for tissue banks--lessons learned from the Cooperative Prostate Cancer Tissue Resource.

PubMed

Patel, Ashokkumar A; Gilbertson, John R; Parwani, Anil V; Dhir, Rajiv; Datta, Milton W; Gupta, Rajnish; Berman, Jules J; Melamed, Jonathan; Kajdacsy-Balla, Andre; Orenstein, Jan; Becich, Michael J

2006-05-05

Advances in molecular biology and growing requirements from biomarker validation studies have generated a need for tissue banks to provide quality-controlled tissue samples with standardized clinical annotation. The NCI Cooperative Prostate Cancer Tissue Resource (CPCTR) is a distributed tissue bank that comprises four academic centers and provides thousands of clinically annotated prostate cancer specimens to researchers. Here we describe the CPCTR information management system architecture, common data element (CDE) development, query interfaces, data curation, and quality control. Data managers review the medical records to collect and continuously update information for the 145 clinical, pathological and inventorial CDEs that the Resource maintains for each case. An Access-based data entry tool provides de-identification and a standard communication mechanism between each group and a central CPCTR database. Standardized automated quality control audits have been implemented. Centrally, an Oracle database has web interfaces allowing multiple user-types, including the general public, to mine de-identified information from all of the sites with three levels of specificity and granularity as well as to request tissues through a formal letter of intent. Since July 2003, CPCTR has offered over 6,000 cases (38,000 blocks) of highly characterized prostate cancer biospecimens, including several tissue microarrays (TMA). The Resource developed a website with interfaces for the general public as well as researchers and internal members. These user groups have utilized the web-tools for public query of summary data on the cases that were available, to prepare requests, and to receive tissues. As of December 2005, the Resource received over 130 tissue requests, of which 45 have been reviewed, approved and filled. Additionally, the Resource implemented the TMA Data Exchange Specification in its TMA program and created a computer program for calculating PSA recurrence. Building a biorepository infrastructure that meets today's research needs involves time and input of many individuals from diverse disciplines. The CPCTR can provide large volumes of carefully annotated prostate tissue for research initiatives such as Specialized Programs of Research Excellence (SPOREs) and for biomarker validation studies and its experience can help development of collaborative, large scale, virtual tissue banks in other organ systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Abe Lederman

This report contains the comprehensive summary of the work performed on the SBIR Phase II project (“Distributed Relevance Ranking in Heterogeneous Document Collections”) at Deep Web Technologies (http://www.deepwebtech.com). We have successfully completed all of the tasks defined in our SBIR Proposal work plan (See Table 1 - Phase II Tasks Status). The project was completed on schedule and we have successfully deployed an initial production release of the software architecture at DOE-OSTI for the Science.gov Alliance's search portal (http://www.science.gov). We have implemented a set of grid services that supports the extraction, filtering, aggregation, and presentation of search results from numerousmore » heterogeneous document collections. Illustration 3 depicts the services required to perform QuickRank™ filtering of content as defined in our architecture documentation. Functionality that has been implemented is indicated by the services highlighted in green. We have successfully tested our implementation in a multi-node grid deployment both within the Deep Web Technologies offices, and in a heterogeneous geographically distributed grid environment. We have performed a series of load tests in which we successfully simulated 100 concurrent users submitting search requests to the system. This testing was performed on deployments of one, two, and three node grids with services distributed in a number of different configurations. The preliminary results from these tests indicate that our architecture will scale well across multi-node grid deployments, but more work will be needed, beyond the scope of this project, to perform testing and experimentation to determine scalability and resiliency requirements. We are pleased to report that a production quality version (1.4) of the science.gov Alliance's search portal based on our grid architecture was released in June of 2006. This demonstration portal is currently available at http://science.gov/search30 . The portal allows the user to select from a number of collections grouped by category and enter a query expression (See Illustration 1 - Science.gov 3.0 Search Page). After the user clicks “search” a results page is displayed that provides a list of results from the selected collections ordered by relevance based on the query expression the user provided. Our grid based solution to deep web search and document ranking has already gained attention within DOE, other Government Agencies and a fortune 50 company. We are committed to the continued development of grid based solutions to large scale data access, filtering, and presentation problems within the domain of Information Retrieval and the more general categories of content management, data mining and data analysis.« less
Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

ERIC Educational Resources Information Center

Chung, EunKyung; Yoon, JungWon

2009-01-01

Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…
Improving Web Search for Difficult Queries

ERIC Educational Resources Information Center

Wang, Xuanhui

2009-01-01

Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…
The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

NASA Astrophysics Data System (ADS)

Heather, David

2016-07-01

Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further advanced search function will allow users to query all the metadata present in the PSA database. Results will be displayed in 3 different ways: 1) A table listing all the corresponding data matching the criteria in the filter menu, 2) a projection of the products onto the surface of the object when applicable (i.e. planets, small bodies), and 3) a list of images for the relevant instruments to enjoy the beauty of our Solar System. These different ways of viewing the datasets will ensure that scientists and non-professionals alike will have access to the specific data they are looking for, regardless of their background. Conclusions: The new PSA will maintain the various interfaces and services it had in the past, and will include significant improvements designed to allow easier and more effective access to the scientific data and supporting materials. The new PSA is expected to be released by mid-2016. It will support the past, present and future missions, ancillary datasets, and will enhance the scientific output of ESA's missions. As such, the PSA will become a unique archive ensuring the long-term preservation and usage of scientific datasets together with user-friendly access.
The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions.

NASA Astrophysics Data System (ADS)

Heather, David; Besse, Sebastien; Barbarisi, Isa; Arviset, Christophe; de Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Martinez, Santa; Rios, Carlos

2016-04-01

Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further advanced search function will allow users to query all the metadata present in the PSA database. Results will be displayed in 3 different ways: 1) A table listing all the corresponding data matching the criteria in the filter menu, 2) a projection of the products onto the surface of the object when applicable (i.e. planets, small bodies), and 3) a list of images for the relevant instruments to enjoy the beauty of our Solar System. These different ways of viewing the datasets will ensure that scientists and non-professionals alike will have access to the specific data they are looking for, regardless of their background. Conclusions: The new PSA will maintain the various interfaces and services it had in the past, and will include significant improvements designed to allow easier and more effective access to the scientific data and supporting materials. The new PSA is expected to be released by mid-2016. It will support the past, present and future missions, ancillary datasets, and will enhance the scientific output of ESA's missions. As such, the PSA will become a unique archive ensuring the long-term preservation and usage of scientific datasets together with user-friendly access.
Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

PubMed Central

2013-01-01

Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556

Teaching Tectonics to Undergraduates with Web GIS

NASA Astrophysics Data System (ADS)

Anastasio, D. J.; Bodzin, A.; Sahagian, D. L.; Rutzmoser, S.

2013-12-01

Geospatial reasoning skills provide a means for manipulating, interpreting, and explaining structured information and are involved in higher-order cognitive processes that include problem solving and decision-making. Appropriately designed tools, technologies, and curriculum can support spatial learning. We present Web-based visualization and analysis tools developed with Javascript APIs to enhance tectonic curricula while promoting geospatial thinking and scientific inquiry. The Web GIS interface integrates graphics, multimedia, and animations that allow users to explore and discover geospatial patterns that are not easily recognized. Features include a swipe tool that enables users to see underneath layers, query tools useful in exploration of earthquake and volcano data sets, a subduction and elevation profile tool which facilitates visualization between map and cross-sectional views, drafting tools, a location function, and interactive image dragging functionality on the Web GIS. The Web GIS platform is independent and can be implemented on tablets or computers. The GIS tool set enables learners to view, manipulate, and analyze rich data sets from local to global scales, including such data as geology, population, heat flow, land cover, seismic hazards, fault zones, continental boundaries, and elevation using two- and three- dimensional visualization and analytical software. Coverages which allow users to explore plate boundaries and global heat flow processes aided learning in a Lehigh University Earth and environmental science Structural Geology and Tectonics class and are freely available on the Web.
The StarView intelligent query mechanism

NASA Technical Reports Server (NTRS)

Semmel, R. D.; Silberberg, D. P.

1993-01-01

The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.
Design of Epidemia - an Ecohealth Informatics System for Integrated Forecasting of Malaria Epidemics

NASA Astrophysics Data System (ADS)

Wimberly, M. C.; Bayabil, E.; Beyane, B.; Bishaw, M.; Henebry, G. M.; Lemma, A.; Liu, Y.; Merkord, C. L.; Mihretie, A.; Senay, G. B.; Yalew, W.

2014-12-01

Early warning of the timing and locations of malaria epidemics can facilitate the targeting of resources for prevention and emergency response. In response to this need, we are developing the Epidemic Prognosis Incorporating Disease and Environmental Monitoring for Integrated Assessment (EPIDEMIA) computer system. The system incorporates software for capturing, processing, and integrating environmental and epidemiological data from multiple sources; data assimilation techniques that continually update models and forecasts; and a web-based interface that makes the resulting information available to public health decision makers. This technology will enable forecasts based on lagged responses to environmental risk factors as well as information about recent trends in malaria cases. Environmental driving variables will include a variety of remote-sensed hydrological indicators. EPIDEMIA will be implemented and tested in the Amhara Region of Ethiopia in collaboration with local stakeholders. We conducted an initial co-design workshop in July 2014 that included environmental scientists, software engineers, and participants from the NGO, academic, and public health sectors in Ethiopia. A prototype of the EPIDEMIA web interface was presented and a requirements analysis was conducted to characterize the main use cases for the public health community, identify the critical data requirements for malaria risk modeling, and develop of a list of baseline features for the public health interface. Several critical system features were identified, including a secure web-based interface for uploading and validating surveillance data; a flexible query system to allow retrieval of environmental and epidemiological data summaries as tables, charts, and maps; and an alert system to provide automatic warnings in response to environmental and epidemiological risk factors for malaria. Future system development will involve a cycle of implementation, training, usability testing, and upgrading. This innovative translational bioinformatics approach will allow us to assess the practical effectiveness of these tools as we continually improve the technologies.
Real-time shipboard displays for science operation and planning on CGC Healy

NASA Astrophysics Data System (ADS)

Roberts, S.; Chayes, D.; Arko, R.

2007-12-01

To facilitate effective science planning and decision making, we have developed a real-time geospatial browser and other displays widely used by many if not all members of USCGC Healy's science cruises and some officers and crew since 2004. In order to enable a 'zero-configuration' experience to the end user with nearly any modern browser, on any platform, anywhere on the ship with wired (or wireless) network access, we chose a Web-based/server-centric approach that provides a very low barrier to access in an environment where we have many participants constantly coming and going, often with their own computers. The principle interface for planning and operational decision making is a georeferenced, Web-based user interface built on the MapServer Web GIS platform developed at the University of Minnesota (http://mapserver.gis.umn.edu/), using the PostGIS spatial database extensions (http://postgis.refractions.net/) to enable live database connectivity. Data available include current ship position and orientation, historical ship tracks and data, seafloor bathymetry, station locations, RADARSAT, and subbottom profiles among others. In addition to the user interfaces that are part of individual instrumentation (such as the sonars and navigation systems), custom interfaces have been developed to centralize data with high update rates such as sea surface temperature, vessel attitude, position, etc. Underlying data acquisition and storage is provided by the Lamont Data System (LDS) and the NOAA SCS system. All data are stored on RAIDed disk systems and shared across a switched network with a gigabit fiber backbone. The real-time displays access data in a number of ways including real-time UDP datagrams from LDS, accessing files on disk, and querying a PostgreSQL relational backend. This work is supported by grants from the U.S. National Science Foundation, Office of Polar Programs, Arctic Science section.
Secure Encapsulation and Publication of Biological Services in the Cloud Computing Environment

PubMed Central

Zhang, Weizhe; Wang, Xuehui; Lu, Bo; Kim, Tai-hoon

2013-01-01

Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved. PMID:24078906
Secure encapsulation and publication of biological services in the cloud computing environment.

PubMed

Zhang, Weizhe; Wang, Xuehui; Lu, Bo; Kim, Tai-hoon

2013-01-01

Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved.
SCHeMA web-based observation data information system

NASA Astrophysics Data System (ADS)

Novellino, Antonio; Benedetti, Giacomo; D'Angelo, Paolo; Confalonieri, Fabio; Massa, Francesco; Povero, Paolo; Tercier-Waeber, Marie-Louise

2016-04-01

It is well recognized that the need of sharing ocean data among non-specialized users is constantly increasing. Initiatives that are built upon international standards will contribute to simplify data processing and dissemination, improve user-accessibility also through web browsers, facilitate the sharing of information across the integrated network of ocean observing systems; and ultimately provide a better understanding of the ocean functioning. The SCHeMA (Integrated in Situ Chemical MApping probe) Project is developing an open and modular sensing solution for autonomous in situ high resolution mapping of a wide range of anthropogenic and natural chemical compounds coupled to master bio-physicochemical parameters (www.schema-ocean.eu). The SCHeMA web system is designed to ensure user-friendly data discovery, access and download as well as interoperability with other projects through a dedicated interface that implements the Global Earth Observation System of Systems - Common Infrastructure (GCI) recommendations and the international Open Geospatial Consortium - Sensor Web Enablement (OGC-SWE) standards. This approach will insure data accessibility in compliance with major European Directives and recommendations. Being modular, the system allows the plug-and-play of commercially available probes as well as new sensor probess under development within the project. The access to the network of monitoring probes is provided via a web-based system interface that, being implemented as a SOS (Sensor Observation Service), is providing standard interoperability and access tosensor observations systems through O&M standard - as well as sensor descriptions - encoded in Sensor Model Language (SensorML). The use of common vocabularies in all metadatabases and data formats, to describe data in an already harmonized and common standard is a prerequisite towards consistency and interoperability. Therefore, the SCHeMA SOS has adopted the SeaVox common vocabularies populated by SeaDataNet network of National Oceanographic Data Centres. The SCHeMA presentation layer, a fundamental part of the software architecture, offers to the user a bidirectional interaction with the integrated system allowing to manage and configure the sensor probes; view the stored observations and metadata, and handle alarms. The overall structure of the web portal developed within the SCHeMA initiative (Sensor Configuration, development of Core Profile interface for data access via OGC standard, external services such as web services, WMS, WFS; and Data download and query manager) will be presented and illustrated with examples of ongoing tests in costal and open sea.
Chapter 18: Web-based Tools - NED VO Services

NASA Astrophysics Data System (ADS)

Mazzarella, J. M.; NED Team

The NASA/IPAC Extragalactic Database (NED) is a thematic, web-based research facility in widespread use by scientists, educators, space missions, and observatory operations for observation planning, data analysis, discovery, and publication of research about objects beyond our Milky Way galaxy. NED is a portal into a systematic fusion of data from hundreds of sky surveys and tens of thousands of research publications. The contents and services span the entire electromagnetic spectrum from gamma rays through radio frequencies, and are continuously updated to reflect the current literature and releases of large-scale sky survey catalogs. NED has been on the Internet since 1990, growing in content, automation and services with the evolution of information technology. NED is the world's largest database of crossidentified extragalactic objects. As of December 2006, the system contains approximately 10 million objects and 15 million multi-wavelength cross-IDs. Over 4 thousand catalogs and published lists covering the entire electromagnetic spectrum have had their objects cross-identified or associated, with fundamental data parameters federated for convenient queries and retrieval. This chapter describes the interoperability of NED services with other components of the Virtual Observatory (VO). Section 1 is a brief overview of the primary NED web services. Section 2 provides a tutorial for using NED services currently available through the NVO Registry. The "name resolver" provides VO portals and related internet services with celestial coordinates for objects specified by catalog identifier (name); any alias can be queried because this service is based on the source cross-IDs established by NED. All major services have been updated to provide output in VOTable (XML) format that can be accessed directly from the NED web interface or using the NVO registry. These include access to images via SIAP, Cone- Search queries, and services providing fundamental, multi-wavelength extragalactic data such as positions, redshifts, photometry and spectral energy distributions (SEDs), and sizes (all with references and uncertainties when available). Section 3 summarizes the advantages of accessing the NED "name resolver" and other NED services via the web to replace the legacy "server mode" custom data structure previously available through a function library provided only in the C programming language. Section 4 illustrates visualization via VOPlot of an SED and the spatial distribution of sources from a NED All-Sky (By Parameters) query. Section 5 describes the new NED Spectral Archive, illustrating how VOTables are being used to standardize the data and metadata as well as the physical units of spectra made available by authors of journal articles and producers of major survey archives; quick-look spectral analysis through convenient interoperability with the SpecView (STScI) Java applet is also shown. Section 6 closes with a summary of the capabilities described herein, which greatly simplify interoperability of NED with other components of the VO, enabling new opportunities for discovery, visualization, and analysis of multiwavelength data.
Drexel at TREC 2014 Federated Web Search Track

DTIC Science & Technology

2014-11-01

of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly[5]. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are
Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

PubMed

Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

2013-01-01

Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.
Super: a web server to rapidly screen superposable oligopeptide fragments from the protein data bank.

PubMed

Collier, James H; Lesk, Arthur M; Garcia de la Banda, Maria; Konagurthu, Arun S

2012-07-01

Searching for well-fitting 3D oligopeptide fragments within a large collection of protein structures is an important task central to many analyses involving protein structures. This article reports a new web server, Super, dedicated to the task of rapidly screening the protein data bank (PDB) to identify all fragments that superpose with a query under a prespecified threshold of root-mean-square deviation (RMSD). Super relies on efficiently computing a mathematical bound on the commonly used structural similarity measure, RMSD of superposition. This allows the server to filter out a large proportion of fragments that are unrelated to the query; >99% of the total number of fragments in some cases. For a typical query, Super scans the current PDB containing over 80,500 structures (with ∼40 million potential oligopeptide fragments to match) in under a minute. Super web server is freely accessible from: http://lcb.infotech.monash.edu.au/super.
New Quality Metrics for Web Search Results

NASA Astrophysics Data System (ADS)

Metaxas, Panagiotis Takis; Ivanova, Lilia; Mustafaraj, Eni

Web search results enjoy an increasing importance in our daily lives. But what can be said about their quality, especially when querying a controversial issue? The traditional information retrieval metrics of precision and recall do not provide much insight in the case of web information retrieval. In this paper we examine new ways of evaluating quality in search results: coverage and independence. We give examples on how these new metrics can be calculated and what their values reveal regarding the two major search engines, Google and Yahoo. We have found evidence of low coverage for commercial and medical controversial queries, and high coverage for a political query that is highly contested. Given the fact that search engines are unwilling to tune their search results manually, except in a few cases that have become the source of bad publicity, low coverage and independence reveal the efforts of dedicated groups to manipulate the search results.
A Framework for WWW Query Processing

NASA Technical Reports Server (NTRS)

Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

2000-01-01

Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).
ChRIS--A web-based neuroimaging and informatics system for collecting, organizing, processing, visualizing and sharing of medical data.

PubMed

Pienaar, Rudolph; Rannou, Nicolas; Bernal, Jorge; Hahn, Daniel; Grant, P Ellen

2015-01-01

The utility of web browsers for general purpose computing, long anticipated, is only now coming into fruition. In this paper we present a web-based medical image data and information management software platform called ChRIS ([Boston] Children's Research Integration System). ChRIS' deep functionality allows for easy retrieval of medical image data from resources typically found in hospitals, organizes and presents information in a modern feed-like interface, provides access to a growing library of plugins that process these data - typically on a connected High Performance Compute Cluster, allows for easy data sharing between users and instances of ChRIS and provides powerful 3D visualization and real time collaboration.
Content Documents Management

NASA Technical Reports Server (NTRS)

Muniz, R.; Hochstadt, J.; Boelke J.; Dalton, A.

2011-01-01

The Content Documents are created and managed under the System Software group with. Launch Control System (LCS) project. The System Software product group is lead by NASA Engineering Control and Data Systems branch (NEC3) at Kennedy Space Center. The team is working on creating Operating System Images (OSI) for different platforms (i.e. AIX, Linux, Solaris and Windows). Before the OSI can be created, the team must create a Content Document which provides the information of a workstation or server, with the list of all the software that is to be installed on it and also the set where the hardware belongs. This can be for example in the LDS, the ADS or the FR-l. The objective of this project is to create a User Interface Web application that can manage the information of the Content Documents, with all the correct validations and filters for administrator purposes. For this project we used one of the most excellent tools in agile development applications called Ruby on Rails. This tool helps pragmatic programmers develop Web applications with Rails framework and Ruby programming language. It is very amazing to see how a student can learn about OOP features with the Ruby language, manage the user interface with HTML and CSS, create associations and queries with gems, manage databases and run a server with MYSQL, run shell commands with command prompt and create Web frameworks with Rails. All of this in a real world project and in just fifteen weeks!
Harmonised information exchange between decentralised food composition database systems.

PubMed

Pakkala, H; Christensen, T; de Victoria, I Martínez; Presser, K; Kadvan, A

2010-11-01

The main aim of the European Food Information Resource (EuroFIR) project is to develop and disseminate a comprehensive, coherent and validated data bank for the distribution of food composition data (FCD). This can only be accomplished by harmonising food description and data documentation and by the use of standardised thesauri. The data bank is implemented through a network of local FCD storages (usually national) under the control and responsibility of the local (national) EuroFIR partner. The implementation of the system based on the EuroFIR specifications is under development. The data interchange happens through the EuroFIR Web Services interface, allowing the partners to implement their system using methods and software suitable for the local computer environment. The implementation uses common international standards, such as Simple Object Access Protocol, Web Service Description Language and Extensible Markup Language (XML). A specifically constructed EuroFIR search facility (eSearch) was designed for end users. The EuroFIR eSearch facility compiles queries using a specifically designed Food Data Query Language and sends a request to those network nodes linked to the EuroFIR Web Services that will most likely have the requested information. The retrieved FCD are compiled into a specifically designed data interchange format (the EuroFIR Food Data Transport Package) in XML, which is sent back to the EuroFIR eSearch facility as the query response. The same request-response operation happens in all the nodes that have been selected in the EuroFIR eSearch facility for a certain task. Finally, the FCD are combined by the EuroFIR eSearch facility and delivered to the food compiler. The implementation of FCD interchange using decentralised computer systems instead of traditional data-centre models has several advantages. First of all, the local partners have more control over their FCD, which will increase commitment and improve quality. Second, a multicentred solution is more economically viable than the creation of a centralised data bank, because of the lack of national political support for multinational systems.
Indexing and Retrieval for the Web.

ERIC Educational Resources Information Center

Rasmussen, Edie M.

2003-01-01

Explores current research on indexing and ranking as retrieval functions of search engines on the Web. Highlights include measuring search engine stability; evaluation of Web indexing and retrieval; Web crawlers; hyperlinks for indexing and ranking; ranking for metasearch; document structure; citation indexing; relevance; query evaluation;…
EarthServer: a Summary of Achievements in Technology, Services, and Standards

NASA Astrophysics Data System (ADS)

Baumann, Peter

2015-04-01

Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timese ries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The transatlantic EarthServer initiative, running from 2011 through 2014, has united 11 partners to establish Big Earth Data Analytics. A key ingredient has been flexibility for users to ask whatever they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level, standards-based query languages which unify data and metadata search in a simple, yet powerful way. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing cod e has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, the pioneer and leading Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language. Reviewers have attested EarthServer that "With no doubt the project has been shaping the Big Earth Data landscape through the standardization activities within OGC, ISO and beyond". We present the project approach, its outcomes and impact on standardization and Big Data technology, and vistas for the future.
Graph-Based Semantic Web Service Composition for Healthcare Data Integration.

PubMed

Arch-Int, Ngamnij; Arch-Int, Somjit; Sonsilphong, Suphachoke; Wanchai, Paweena

2017-01-01

Within the numerous and heterogeneous web services offered through different sources, automatic web services composition is the most convenient method for building complex business processes that permit invocation of multiple existing atomic services. The current solutions in functional web services composition lack autonomous queries of semantic matches within the parameters of web services, which are necessary in the composition of large-scale related services. In this paper, we propose a graph-based Semantic Web Services composition system consisting of two subsystems: management time and run time. The management-time subsystem is responsible for dependency graph preparation in which a dependency graph of related services is generated automatically according to the proposed semantic matchmaking rules. The run-time subsystem is responsible for discovering the potential web services and nonredundant web services composition of a user's query using a graph-based searching algorithm. The proposed approach was applied to healthcare data integration in different health organizations and was evaluated according to two aspects: execution time measurement and correctness measurement.
Graph-Based Semantic Web Service Composition for Healthcare Data Integration

PubMed Central

2017-01-01

Within the numerous and heterogeneous web services offered through different sources, automatic web services composition is the most convenient method for building complex business processes that permit invocation of multiple existing atomic services. The current solutions in functional web services composition lack autonomous queries of semantic matches within the parameters of web services, which are necessary in the composition of large-scale related services. In this paper, we propose a graph-based Semantic Web Services composition system consisting of two subsystems: management time and run time. The management-time subsystem is responsible for dependency graph preparation in which a dependency graph of related services is generated automatically according to the proposed semantic matchmaking rules. The run-time subsystem is responsible for discovering the potential web services and nonredundant web services composition of a user's query using a graph-based searching algorithm. The proposed approach was applied to healthcare data integration in different health organizations and was evaluated according to two aspects: execution time measurement and correctness measurement. PMID:29065602

Integrating Engineering Data Systems for NASA Spaceflight Projects

NASA Technical Reports Server (NTRS)

Carvalho, Robert E.; Tollinger, Irene; Bell, David G.; Berrios, Daniel C.

2012-01-01

NASA has a large range of custom-built and commercial data systems to support spaceflight programs. Some of the systems are re-used by many programs and projects over time. Management and systems engineering processes require integration of data across many of these systems, a difficult problem given the widely diverse nature of system interfaces and data models. This paper describes an ongoing project to use a central data model with a web services architecture to support the integration and access of linked data across engineering functions for multiple NASA programs. The work involves the implementation of a web service-based middleware system called Data Aggregator to bring together data from a variety of systems to support space exploration. Data Aggregator includes a central data model registry for storing and managing links between the data in disparate systems. Initially developed for NASA's Constellation Program needs, Data Aggregator is currently being repurposed to support the International Space Station Program and new NASA projects with processes that involve significant aggregating and linking of data. This change in user needs led to development of a more streamlined data model registry for Data Aggregator in order to simplify adding new project application data as well as standardization of the Data Aggregator query syntax to facilitate cross-application querying by client applications. This paper documents the approach from a set of stand-alone engineering systems from which data are manually retrieved and integrated, to a web of engineering data systems from which the latest data are automatically retrieved and more quickly and accurately integrated. This paper includes the lessons learned through these efforts, including the design and development of a service-oriented architecture and the evolution of the data model registry approaches as the effort continues to evolve and adapt to support multiple NASA programs and priorities.
A Natural Language Interface Concordant with a Knowledge Base.

PubMed

Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

2016-01-01

The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.
Semantic based man-machine interface for real-time communication

NASA Technical Reports Server (NTRS)

Ali, M.; Ai, C.-S.

1988-01-01

A flight expert system (FLES) was developed to assist pilots in monitoring, diagnosing and recovering from in-flight faults. To provide a communications interface between the flight crew and FLES, a natural language interface (NALI) was implemented. Input to NALI is processed by three processors: (1) the semantics parser; (2) the knowledge retriever; and (3) the response generator. First the semantic parser extracts meaningful words and phrases to generate an internal representation of the query. At this point, the semantic parser has the ability to map different input forms related to the same concept into the same internal representation. Then the knowledge retriever analyzes and stores the context of the query to aid in resolving ellipses and pronoun references. At the end of this process, a sequence of retrievel functions is created as a first step in generating the proper response. Finally, the response generator generates the natural language response to the query. The architecture of NALI was designed to process both temporal and nontemporal queries. The architecture and implementation of NALI are described.
SPARK: Adapting Keyword Query to Semantic Search

NASA Astrophysics Data System (ADS)

Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
Automatically exposing OpenLifeData via SADI semantic Web Services.

PubMed

González, Alejandro Rodríguez; Callahan, Alison; Cruz-Toledo, José; Garcia, Adrian; Egaña Aranguren, Mikel; Dumontier, Michel; Wilkinson, Mark D

2014-01-01

Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.
Accounting Data to Web Interface Using PERL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hargeaves, C

2001-08-13

This document will explain the process to create a web interface for the accounting information generated by the High Performance Storage Systems (HPSS) accounting report feature. The accounting report contains useful data but it is not easily accessed in a meaningful way. The accounting report is the only way to see summarized storage usage information. The first step is to take the accounting data, make it meaningful and store the modified data in persistent databases. The second step is to generate the various user interfaces, HTML pages, that will be used to access the data. The third step is tomore » transfer all required files to the web server. The web pages pass parameters to Common Gateway Interface (CGI) scripts that generate dynamic web pages and graphs. The end result is a web page with specific information presented in text with or without graphs. The accounting report has a specific format that allows the use of regular expressions to verify if a line is storage data. Each storage data line is stored in a detailed database file with a name that includes the run date. The detailed database is used to create a summarized database file that also uses run date in its name. The summarized database is used to create the group.html web page that includes a list of all storage users. Scripts that query the database folder to build a list of available databases generate two additional web pages. A master script that is run monthly as part of a cron job, after the accounting report has completed, manages all of these individual scripts. All scripts are written in the PERL programming language. Whenever possible data manipulation scripts are written as filters. All scripts are written to be single source, which means they will function properly on both the open and closed networks at LLNL. The master script handles the command line inputs for all scripts, file transfers to the web server and records run information in a log file. The rest of the scripts manipulate the accounting data or use the files created to generate HTML pages. Each script will be described in detail herein. The following is a brief description of HPSS taken directly from an HPSS web site. ''HPSS is a major development project, which began in 1993 as a Cooperative Research and Development Agreement (CRADA) between government and industry. The primary objective of HPSS is to move very large data objects between high performance computers, workstation clusters, and storage libraries at speeds many times faster than is possible with today's software systems. For example, HPSS can manage parallel data transfers from multiple network-connected disk arrays at rates greater than 1 Gbyte per second, making it possible to access high definition digitized video in real time.'' The HPSS accounting report is a canned report whose format is controlled by the HPSS developers.« less
GI-conf: A configuration tool for the GI-cat distributed catalog

NASA Astrophysics Data System (ADS)

Papeschi, F.; Boldrini, E.; Bigagli, L.; Mazzetti, P.

2009-04-01

In this work we present a configuration tool for the GI-cat. In an Service-Oriented Architecture (SOA) framework, GI-cat implements a distributed catalog service providing advanced capabilities, such as: caching, brokering and mediation functionalities. GI-cat applies a distributed approach, being able to distribute queries to the remote service providers of interest in an asynchronous style, and notifies the status of the queries to the caller implementing an incremental feedback mechanism. Today, GI-cat functionalities are made available through two standard catalog interfaces: the OGC CSW ISO and CSW Core Application Profiles. However, two other interfaces are under testing: the CIM and the EO Extension Packages of the CSW ebRIM Application Profile. GI-cat is able to interface a multiplicity of discovery and access services serving heterogeneous Earth and Space Sciences resources. They include international standards like the OGC Web Services -i.e. OGC CSW, WCS, WFS and WMS, as well as interoperability arrangements (i.e. community standards) such as: UNIDATA THREDDS/OPeNDAP, SeaDataNet CDI (Common Data Index), GBIF (Global Biodiversity Information Facility) services, and SibESS-C infrastructure services. GI-conf implements user-friendly configuration tool for GI-cat. This is a GUI application that employs a visual and very simple approach to configure both the GI-cat publishing and distribution capabilities, in a dynamic way. The tool allows to set one or more GI-cat configurations. Each configuration consists of: a) the catalog standards interfaces published by GI-cat; b) the resources (i.e. services/servers) to be accessed and mediated -i.e. federated. Simple icons are used for interfaces and resources, implementing a user-friendly visual approach. The main GI-conf functionalities are: • Interfaces and federated resources management: user can set which interfaces must be published; besides, she/he can add a new resource, update or remove an already federated resource. • Multiple configuration management: multiple GI-cat configurations can be defined; every configuration identifies a set of published interfaces and a set of federated resources. Configurations can be edited, added, removed, exported, and even imported. • HTML report creation: an HTML report can be created, showing the current active GI-cat configuration, including the resources that are being federated and the published interface endpoints. The configuration tool is shipped with GI-cat and can be used to configure the service after its installation is completed.
Building a Smart Portal for Astronomy

NASA Astrophysics Data System (ADS)

Derriere, S.; Boch, T.

2011-07-01

The development of a portal for accessing astronomical resources is not an easy task. The ever-increasing complexity of the data products can result in very complex user interfaces, requiring a lot of effort and learning from the user in order to perform searches. This is often a design choice, where the user must explicitly set many constraints, while the portal search logic remains simple. We investigated a different approach, where the query interface is kept as simple as possible (ideally, a simple text field, like for Google search), and the search logic is made much more complex to interpret the query in a relevant manner. We will present the implications of this approach in terms of interpretation and categorization of the query parameters (related to astronomical vocabularies), translation (mapping) of these concepts into the portal components metadata, identification of query schemes and use cases matching the input parameters, and delivery of query results to the user.
Integrating a local database into the StarView distributed user interface

NASA Technical Reports Server (NTRS)

Silberberg, D. P.

1992-01-01

A distributed user interface to the Space Telescope Data Archive and Distribution Service (DADS) known as StarView is being developed. The DADS architecture consists of the data archive as well as a relational database catalog describing the archive. StarView is a client/server system in which the user interface is the front-end client to the DADS catalog and archive servers. Users query the DADS catalog from the StarView interface. Query commands are transmitted via a network and evaluated by the database. The results are returned via the network and are displayed on StarView forms. Based on the results, users decide which data sets to retrieve from the DADS archive. Archive requests are packaged by StarView and sent to DADS, which returns the requested data sets to the users. The advantages of distributed client/server user interfaces over traditional one-machine systems are well known. Since users run software on machines separate from the database, the overall client response time is much faster. Also, since the server is free to process only database requests, the database response time is much faster. Disadvantages inherent in this architecture are slow overall database access time due to the network delays, lack of a 'get previous row' command, and that refinements of a previously issued query must be submitted to the database server, even though the domain of values have already been returned by the previous query. This architecture also does not allow users to cross correlate DADS catalog data with other catalogs. Clearly, a distributed user interface would be more powerful if it overcame these disadvantages. A local database is being integrated into StarView to overcome these disadvantages. When a query is made through a StarView form, which is often composed of fields from multiple tables, it is translated to an SQL query and issued to the DADS catalog. At the same time, a local database table is created to contain the resulting rows of the query. The returned rows are displayed on the form as well as inserted into the local database table. Identical results are produced by reissuing the query to either the DADS catalog or to the local table. Relational databases do not provide a 'get previous row' function because of the inherent complexity of retrieving previous rows of multiple-table joins. However, since this function is easily implemented on a single table, StarView uses the local table to retrieve the previous row. Also, StarView issues subsequent query refinements to the local table instead of the DADS catalog, eliminating the network transmission overhead. Finally, other catalogs can be imported into the local database for cross correlation with local tables. Overall, it is believe that this is a more powerful architecture for distributed, database user interfaces.
LandEx - Fast, FOSS-Based Application for Query and Retrieval of Land Cover Patterns

NASA Astrophysics Data System (ADS)

Netzel, P.; Stepinski, T.

2012-12-01

The amount of satellite-based spatial data is continuously increasing making a development of efficient data search tools a priority. The bulk of existing research on searching satellite-gathered data concentrates on images and is based on the concept of Content-Based Image Retrieval (CBIR); however, available solutions are not efficient and robust enough to be put to use as deployable web-based search tools. Here we report on development of a practical, deployable tool that searches classified, rather than raw image. LandEx (Landscape Explorer) is a GeoWeb-based tool for Content-Based Pattern Retrieval (CBPR) contained within the National Land Cover Dataset 2006 (NLCD2006). The USGS-developed NLCD2006 is derived from Landsat multispectral images; it covers the entire conterminous U.S. with the resolution of 30 meters/pixel and it depicts 16 land cover classes. The size of NLCD2006 is about 10 Gpixels (161,000 x 100,000 pixels). LandEx is a multi-tier GeoWeb application based on Open Source Software. Main components are: GeoExt/OpenLayers (user interface), GeoServer (OGC WMS, WCS and WPS server), and GRASS (calculation engine). LandEx performs search using query-by-example approach: user selects a reference scene (exhibiting a chosen pattern of land cover classes) and the tool produces, in real time, a map indicating a degree of similarity between the reference pattern and all local patterns across the U.S. Scene pattern is encapsulated by a 2D histogram of classes and sizes of single-class clumps. Pattern similarity is based on the notion of mutual information. The resultant similarity map can be viewed and navigated in a web browser, or it can download as a GeoTiff file for more in-depth analysis. The LandEx is available at http://sil.uc.edu
Linked Data: what does it offer Earth Sciences?

NASA Astrophysics Data System (ADS)

Cox, Simon; Schade, Sven

2010-05-01

'Linked Data' is a current buzz-phrase promoting access to various forms of data on the internet. It starts from the two principles that have underpinned the architecture and scalability of the World Wide Web: 1. Universal Resource Identifiers - using the http protocol which is supported by the DNS system. 2. Hypertext - in which URIs of related resources are embedded within a document. Browsing is the key mode of interaction, with traversal of links between resources under control of the client. Linked Data also adds, or re-emphasizes: • Content negotiation - whereby the client uses http headers to tell the service what representation of a resource is acceptable, • Semantic Web principles - formal semantics for links, following the RDF data model and encoding, and • The 'mashup' effect - in which original and unexpected value may emerge from reuse of data, even if published in raw or unpolished form. Linked Data promotes typed links to all kinds of data, so is where the semantic web meets the 'deep web', i.e. resources which may be accessed using web protocols, but are in representations not indexed by search engines. Earth sciences are data rich, but with a strong legacy of specialized formats managed and processed by disconnected applications. However, most contemporary research problems require a cross-disciplinary approach, in which the heterogeneity resulting from that legacy is a significant challenge. In this context, Linked Data clearly has much to offer the earth sciences. But, there are some important questions to answer. What is a resource? Most earth science data is organized in arrays and databases. A subset useful for a particular study is usually identified by a parameterized query. The Linked Data paradigm emerged from the world of documents, and will often only resolve data-sets. It is impractical to create even nested navigation resources containing links to all potentially useful objects or subsets. From the viewpoint of human user interfaces, the browse metaphor, which has been such an important part of the success of the web, must be augmented with other interaction mechanisms, including query. What are the impacts on search and metadata? Hypertext provides links selected by the page provider. However, science should endeavor to be exhaustive in its use of data. Resource discovery through links must be supplemented by more systematic data discovery through search. Conversely, the crawlers that generate search indexes must be fed by resource providers (a) serving navigation pages with links to every dataset (b) adding enough 'metadata' (semantics) on each link to effectively populate the indexes. Linked Data makes this easier due to its integration with semantic web technologies, including structured vocabularies. What is the relation between structured data and Linked Data? Linked Data has focused on web-pages (primarily HTML) for human browsing, and RDF for semantics, assuming that other representations are opaque. However, this overlooks the wealth of XML data on the web, some of which is structured according to XML Schemas that provide semantics. Technical applications can use content-negotiation to get a structured representation, and exploit its semantics. Particularly relevant for earth sciences are data representations based on OGC Geography Markup Language (GML), such as GeoSciML, O&M and MOLES. GML was strongly influenced by RDF, and typed links are intrinsic: xlink:href plays the role that rdf:resource does in RDF representations. Services which expose GML-formatted resources (such as OGC Web Feature Service) are a prototype of Linked Data. Giving credit where it is due. Organizations investing in data collection may be reluctant to publish the raw data prior to completing an initial analysis. To encourage early data publication the system must provide suitable incentives, and citation analysis must recognize the increasing diversity of publication routes and forms. Linked Data makes it easier to include rich citation information when data is both published and used.
User centered and ontology based information retrieval system for life sciences.

PubMed

Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent

2012-01-25

Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
User centered and ontology based information retrieval system for life sciences

PubMed Central

2012-01-01

Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375
Automation and integration of components for generalized semantic markup of electronic medical texts.

PubMed Central

Dugan, J. M.; Berrios, D. C.; Liu, X.; Kim, D. K.; Kaizer, H.; Fagan, L. M.

1999-01-01

Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models. Images Figure 1 Figure 2 Figure 4 Figure 5 PMID:10566457
A web-based, relational database for studying glaciers in the Italian Alps

NASA Astrophysics Data System (ADS)

Nigrelli, G.; Chiarle, M.; Nuzzi, A.; Perotti, L.; Torta, G.; Giardino, M.

2013-02-01

Glaciers are among the best terrestrial indicators of climate change and thus glacier inventories have attracted a growing, worldwide interest in recent years. In Italy, the first official glacier inventory was completed in 1925 and 774 glacial bodies were identified. As the amount of data continues to increase, and new techniques become available, there is a growing demand for computer tools that can efficiently manage the collected data. The Research Institute for Geo-hydrological Protection of the National Research Council, in cooperation with the Departments of Computer Science and Earth Sciences of the University of Turin, created a database that provides a modern tool for storing, processing and sharing glaciological data. The database was developed according to the need of storing heterogeneous information, which can be retrieved through a set of web search queries. The database's architecture is server-side, and was designed by means of an open source software. The website interface, simple and intuitive, was intended to meet the needs of a distributed public: through this interface, any type of glaciological data can be managed, specific queries can be performed, and the results can be exported in a standard format. The use of a relational database to store and organize a large variety of information about Italian glaciers collected over the last hundred years constitutes a significant step forward in ensuring the safety and accessibility of such data. Moreover, the same benefits also apply to the enhanced operability for handling information in the future, including new and emerging types of data formats, such as geographic and multimedia files. Future developments include the integration of cartographic data, such as base maps, satellite images and vector data. The relational database described in this paper will be the heart of a new geographic system that will merge data, data attributes and maps, leading to a complete description of Italian glacial environments.
Deep Web video

ScienceCinema

None Available

2018-02-06

To make the web work better for science, OSTI has developed state-of-the-art technologies and services including a deep web search capability. The deep web includes content in searchable databases available to web users but not accessible by popular search engines, such as Google. This video provides an introduction to the deep web search engine.
Semantator: semantic annotator for converting biomedical text to linked data.

PubMed

Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G

2013-10-01

More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.
The Evolvable Advanced Multi-Mission Operations System (AMMOS): Making Systems Interoperable

NASA Technical Reports Server (NTRS)

Ko, Adans Y.; Maldague, Pierre F.; Bui, Tung; Lam, Doris T.; McKinney, John C.

2010-01-01

The Advanced Multi-Mission Operations System (AMMOS) provides a common Mission Operation System (MOS) infrastructure to NASA deep space missions. The evolution of AMMOS has been driven by two factors: increasingly challenging requirements from space missions, and the emergence of new IT technology. The work described in this paper focuses on three key tasks related to IT technology requirements: first, to eliminate duplicate functionality; second, to promote the use of loosely coupled application programming interfaces, text based file interfaces, web-based frameworks and integrated Graphical User Interfaces (GUI) to connect users, data, and core functionality; and third, to build, develop, and deploy AMMOS services that are reusable, agile, adaptive to project MOS configurations, and responsive to industrially endorsed information technology standards.
An online model composition tool for system biology models

PubMed Central

2013-01-01

Background There are multiple representation formats for Systems Biology computational models, and the Systems Biology Markup Language (SBML) is one of the most widely used. SBML is used to capture, store, and distribute computational models by Systems Biology data sources (e.g., the BioModels Database) and researchers. Therefore, there is a need for all-in-one web-based solutions that support advance SBML functionalities such as uploading, editing, composing, visualizing, simulating, querying, and browsing computational models. Results We present the design and implementation of the Model Composition Tool (Interface) within the PathCase-SB (PathCase Systems Biology) web portal. The tool helps users compose systems biology models to facilitate the complex process of merging systems biology models. We also present three tools that support the model composition tool, namely, (1) Model Simulation Interface that generates a visual plot of the simulation according to user’s input, (2) iModel Tool as a platform for users to upload their own models to compose, and (3) SimCom Tool that provides a side by side comparison of models being composed in the same pathway. Finally, we provide a web site that hosts BioModels Database models and a separate web site that hosts SBML Test Suite models. Conclusions Model composition tool (and the other three tools) can be used with little or no knowledge of the SBML document structure. For this reason, students or anyone who wants to learn about systems biology will benefit from the described functionalities. SBML Test Suite models will be a nice starting point for beginners. And, for more advanced purposes, users will able to access and employ models of the BioModels Database as well. PMID:24006914
Teaching And Learning Tectonics With Web-GIS

NASA Astrophysics Data System (ADS)

Anastasio, D. J.; Sahagian, D. L.; Bodzin, A.; Teletzke, A. L.; Rutzmoser, S.; Cirucci, L.; Bressler, D.; Burrows, J. E.

2012-12-01

Tectonics is a new curriculum enhancement consisting of six Web GIS investigations designed to augment a traditional middle school Earth science curriculum. The investigations are aligned to Disciplinary Core Ideas: Earth and Space Science from the National Research Council's (2012) Framework for K-12 Science Education and to tectonics benchmark ideas articulated in the AAAS Project 2061 (2007) Atlas of Science Literacy. The curriculum emphasizes geospatial thinking and scientific inquiry and consists of the following modules: Geohazards, which plate boundary is closest to me? How do we recognize plate boundaries? How does thermal energy move around the Earth? What happens when plates diverge? What happens when plate move sideways past each other? What happens when plates collide? The Web GIS interface uses JavaScript for simplicity, intuition, and convenience for implementation on a variety of platforms making it easier for diverse middle school learners and their teachers to conduct authentic Earth science investigations, including multidisciplinary visualization, analysis, and synthesis of data. Instructional adaptations allow students who are English language learners, have disabilities, or are reluctant readers to perform advanced desktop GIS functions including spatial analysis, map visualization and query. The Web GIS interface integrates graphics, multimedia, and animation in addition to newly developed features, which allow users to explore and discover geospatial patterns that would not be easily visible using typical classroom instructional materials. The Tectonics curriculum uses a spatial learning design model that incorporates a related set of frameworks and design principles. The framework builds on the work of other successful technology-integrated curriculum projects and includes, alignment of materials and assessments with learning goals, casting key ideas in real-world problems, engaging students in scientific practices that foster the use of key ideas, uses geospatial technology, and supports for teachers in adopting and implementing GIS and inquiry-based activities.

miRMaid: a unified programming interface for microRNA data resources

PubMed Central

2010-01-01

Background MicroRNAs (miRNAs) are endogenous small RNAs that play a key role in post-transcriptional regulation of gene expression in animals and plants. The number of known miRNAs has increased rapidly over the years. The current release (version 14.0) of miRBase, the central online repository for miRNA annotation, comprises over 10.000 miRNA precursors from 115 different species. Furthermore, a large number of decentralized online resources are now available, each contributing with important miRNA annotation and information. Results We have developed a software framework, designated here as miRMaid, with the goal of integrating miRNA data resources in a uniform web service interface that can be accessed and queried by researchers and, most importantly, by computers. miRMaid is built around data from miRBase and is designed to follow the official miRBase data releases. It exposes miRBase data as inter-connected web services. Third-party miRNA data resources can be modularly integrated as miRMaid plugins or they can loosely couple with miRMaid as individual entities in the World Wide Web. miRMaid is available as a public web service but is also easily installed as a local application. The software framework is freely available under the LGPL open source license for academic and commercial use. Conclusion miRMaid is an intuitive and modular software platform designed to unify miRBase and independent miRNA data resources. It enables miRNA researchers to computationally address complex questions involving the multitude of miRNA data resources. Furthermore, miRMaid constitutes a basic framework for further programming in which microRNA-interested bioinformaticians can readily develop their own tools and data sources. PMID:20074352
The Chandra Source Catalog 2.0: Interfaces

NASA Astrophysics Data System (ADS)

D'Abrusco, Raffaele; Zografou, Panagoula; Tibbetts, Michael; Allen, Christopher E.; Anderson, Craig S.; Budynkiewicz, Jamie A.; Burke, Douglas; Chen, Judy C.; Civano, Francesca Maria; Doe, Stephen M.; Evans, Ian N.; Evans, Janet D.; Fabbiano, Giuseppina; Gibbs, Danny G., II; Glotfelty, Kenny J.; Graessle, Dale E.; Grier, John D.; Hain, Roger; Hall, Diane M.; Harbo, Peter N.; Houck, John C.; Lauer, Jennifer L.; Laurino, Omar; Lee, Nicholas P.; Martínez-Galarza, Rafael; McCollough, Michael L.; McDowell, Jonathan C.; Miller, Joseph; McLaughlin, Warren; Morgan, Douglas L.; Mossman, Amy E.; Nguyen, Dan T.; Nichols, Joy S.; Nowak, Michael A.; Paxson, Charles; Plummer, David A.; Primini, Francis Anthony; Rots, Arnold H.; Siemiginowska, Aneta; Sundheim, Beth A.; Van Stone, David W.

2018-01-01

Easy-to-use, powerful public interfaces to access the wealth of information contained in any modern, complex astronomical catalog are fundamental to encourage its usage. In this poster,I present the public interfaces of the second Chandra Source Catalog (CSC2). CSC2 is the most comprehensive catalog of X-ray sources detected by Chandra, thanks to the inclusion of Chandra observations public through the end of 2014 and to methodological advancements. CSC2 provides measured properties for a large number of sources that sample the X-ray sky at fainter levels than the previous versions of the CSC, thanks to the stacking of single overlapping observations within 1’ before source detection. Sources from stacks are then crossmatched, if multiple stacks cover the same area of the sky, to create a list of unique, optimal CSC2 sources. The properties of sources detected in each single stack and each single observation are also measured. The layered structure of the CSC2 catalog is mirrored in the organization of the CSC2 database, consisting of three tables containing all properties for the unique stacked sources (“Master Source”), single stack sources (“Stack Source”) and sources in any single observation (“Observation Source”). These tables contain estimates of the position, flags, extent, significances, fluxes, spectral properties and variability (and associated errors) for all classes of sources. The CSC2 also includes source region and full-field data products for all master sources, stack sources and observation sources: images, photon event lists, light curves and spectra.CSCview, the main interface to the CSC2 source properties and data products, is a GUI tool that allows to build queries based on the values of all properties contained in CSC2 tables, query the catalog, inspect the returned table of source properties, browse and download the associated data products. I will also introduce the suite of command-line interfaces to CSC2 that can be used in alternative to CSCview, and will present the concept for an additional planned cone-search web-based interface.This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical Observatory for operation of the Chandra X-ray Center.
Commercial Building Energy Saver, API

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Tianzhen; Piette, Mary; Lee, Sang Hoon

2015-08-27

The CBES API provides Application Programming Interface to a suite of functions to improve energy efficiency of buildings, including building energy benchmarking, preliminary retrofit analysis using a pre-simulation database DEEP, and detailed retrofit analysis using energy modeling with the EnergyPlus simulation engine. The CBES API is used to power the LBNL CBES Web App. It can be adopted by third party developers and vendors into their software tools and platforms.
ARIANE: integration of information databases within a hospital intranet.

PubMed

Joubert, M; Aymard, S; Fieschi, D; Volot, F; Staccini, P; Robert, J J; Fieschi, M

1998-05-01

Large information systems handle massive volume of data stored in heterogeneous sources. Each server has its own model of representation of concepts with regard to its aims. One of the main problems end-users encounter when accessing different servers is to match their own viewpoint on biomedical concepts with the various representations that are made in the databases servers. The aim of the project ARIANE is to provide end-users with easy-to-use and natural means to access and query heterogeneous information databases. The objectives of this research work consist in building a conceptual interface by means of the Internet technology inside an enterprise Intranet and to propose a method to realize it. This method is based on the knowledge sources provided by the Unified Medical Language System (UMLS) project of the US National Library of Medicine. Experiments concern queries to three different information servers: PubMed, a Medline server of the NLM; Thériaque, a French database on drugs implemented in the Hospital Intranet; and a Web site dedicated to Internet resources in gastroenterology and nutrition, located at the Faculty of Medicine of Nice (France). Accessing to each of these servers is different according to the kind of information delivered and according to the technology used to query it. Dealing with health care professional workstation, the authors introduced in the ARIANE project quality criteria in order to attempt a homogeneous and efficient way to build a query system able to be integrated in existing information systems and to integrate existing and new information sources.
Python Winding Itself Around Datacubes: How to Access Massive Multi-Dimensional Arrays in a Pythonic Way

NASA Astrophysics Data System (ADS)

Merticariu, Vlad; Misev, Dimitar; Baumann, Peter

2017-04-01

While python has developed into the lingua franca in Data Science there is often a paradigm break when accessing specialized tools. In particular for one of the core data categories in science and engineering, massive multi-dimensional arrays, out-of-memory solutions typically employ their own, different models. We discuss this situation on the example of the scalable open-source array engine, rasdaman ("raster data manager") which offers access to and processing of Petascale multi-dimensional arrays through an SQL-style array query language, rasql. Such queries are executed in the server on a storage engine utilizing adaptive array partitioning and based on a processing engine implementing a "tile streaming" paradigm to allow processing of arrays massively larger than server RAM. The rasdaman QL has acted as blueprint for forthcoming ISO Array SQL and the Open Geospatial Consortium (OGC) geo analytics language, Web Coverage Processing Service, adopted in 2008. Not surprisingly, rasdaman is OGC and INSPIRE Reference Implementation for their "Big Earth Data" standards suite. Recently, rasdaman has been augmented with a python interface which allows to transparently interact with the database (credits go to Siddharth Shukla's Master Thesis at Jacobs University). Programmers do not need to know the rasdaman query language, as the operators are silently transformed, through lazy evaluation, into queries. Arrays delivered are likewise automatically transformed into their python representation. In the talk, the rasdaman concept will be illustrated with the help of large-scale real-life examples of operational satellite image and weather data services, and sample python code.
Semantic Enhancement for Enterprise Data Management

NASA Astrophysics Data System (ADS)

Ma, Li; Sun, Xingzhi; Cao, Feng; Wang, Chen; Wang, Xiaoyuan; Kanellos, Nick; Wolfson, Dan; Pan, Yue

Taking customer data as an example, the paper presents an approach to enhance the management of enterprise data by using Semantic Web technologies. Customer data is the most important kind of core business entity a company uses repeatedly across many business processes and systems, and customer data management (CDM) is becoming critical for enterprises because it keeps a single, complete and accurate record of customers across the enterprise. Existing CDM systems focus on integrating customer data from all customer-facing channels and front and back office systems through multiple interfaces, as well as publishing customer data to different applications. To make the effective use of the CDM system, this paper investigates semantic query and analysis over the integrated and centralized customer data, enabling automatic classification and relationship discovery. We have implemented these features over IBM Websphere Customer Center, and shown the prototype to our clients. We believe that our study and experiences are valuable for both Semantic Web community and data management community.
LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis.

PubMed

Nagraj, V P; Magee, Neal E; Sheffield, Nathan C

2018-06-06

The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, new genomic data can be easily connected to annotations from external data sources. Here, we present an interactive interface for enrichment analysis of genomic locus overlaps using a web server called LOLAweb. LOLAweb accepts a set of genomic ranges from the user and tests it for enrichment against a database of region sets. LOLAweb renders results in an R Shiny application to provide interactive visualization features, enabling users to filter, sort, and explore enrichment results dynamically. LOLAweb is built and deployed in a Linux container, making it scalable to many concurrent users on our servers and also enabling users to download and run LOLAweb locally.
CRF: detection of CRISPR arrays using random forest.

PubMed

Wang, Kai; Liang, Chun

2017-01-01

CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
SeqDepot: streamlined database of biological sequences and precomputed features.

PubMed

Ulrich, Luke E; Zhulin, Igor B

2014-01-15

Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot-a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Freely available on the web at http://seqdepot.net/. RESTaccess via http://seqdepot.net/api/v1. Database files and scripts maybe downloaded from http://seqdepot.net/download.
JAX Colony Management System (JCMS): an extensible colony and phenotype data management system.

PubMed

Donnelly, Chuck J; McFarland, Mike; Ames, Abigail; Sundberg, Beth; Springer, Dave; Blauth, Peter; Bult, Carol J

2010-04-01

The Jackson Laboratory Colony Management System (JCMS) is a software application for managing data and information related to research mouse colonies, associated biospecimens, and experimental protocols. JCMS runs directly on computers that run one of the PC Windows operating systems, but can be accessed via web browser interfaces from any computer running a Windows, Macintosh, or Linux operating system. JCMS can be configured for a single user or multiple users in small- to medium-size work groups. The target audience for JCMS includes laboratory technicians, animal colony managers, and principal investigators. The application provides operational support for colony management and experimental workflows, sample and data tracking through transaction-based data entry forms, and date-driven work reports. Flexible query forms allow researchers to retrieve database records based on user-defined criteria. Recent advances in handheld computers with integrated barcode readers, middleware technologies, web browsers, and wireless networks add to the utility of JCMS by allowing real-time access to the database from any networked computer.
A Flexible Monitoring Infrastructure for the Simulation Requests

NASA Astrophysics Data System (ADS)

Spinoso, V.; Missiato, M.

2014-06-01

Running and monitoring simulations usually involves several different aspects of the entire workflow: the configuration of the job, the site issues, the software deployment at the site, the file catalogue, the transfers of the simulated data. In addition, the final product of the simulation is often the result of several sequential steps. This project tries a different approach to monitoring the simulation requests. All the necessary data are collected from the central services which lead the submission of the requests and the data management, and stored by a backend into a NoSQL-based data cache; those data can be queried through a Web Service interface, which returns JSON responses, and allows users, sites, physics groups to easily create their own web frontend, aggregating only the needed information. As an example, it will be shown how it is possible to monitor the CMS services (ReqMgr, DAS/DBS, PhEDEx) using a central backend and multiple customized cross-language frontends.
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
AncestrySNPminer: A bioinformatics tool to retrieve and develop ancestry informative SNP panels

PubMed Central

Amirisetty, Sushil; Khurana Hershey, Gurjit K.; Baye, Tesfaye M.

2012-01-01

A wealth of genomic information is available in public and private databases. However, this information is underutilized for uncovering population specific and functionally relevant markers underlying complex human traits. Given the huge amount of SNP data available from the annotation of human genetic variation, data mining is a faster and cost effective approach for investigating the number of SNPs that are informative for ancestry. In this study, we present AncestrySNPminer, the first web-based bioinformatics tool specifically designed to retrieve Ancestry Informative Markers (AIMs) from genomic data sets and link these informative markers to genes and ontological annotation classes. The tool includes an automated and simple “scripting at the click of a button” functionality that enables researchers to perform various population genomics statistical analyses methods with user friendly querying and filtering of data sets across various populations through a single web interface. AncestrySNPminer can be freely accessed at https://research.cchmc.org/mershalab/AncestrySNPminer/login.php. PMID:22584067
EVALLER: a web server for in silico assessment of potential protein allergenicity

PubMed Central

Barrio, Alvaro Martinez; Soeria-Atmadja, Daniel; Nistér, Anders; Gustafsson, Mats G.; Hammerling, Ulf; Bongcam-Rudloff, Erik

2007-01-01

Bioinformatics testing approaches for protein allergenicity, involving amino acid sequence comparisons, have evolved appreciably over the last several years to increased sophistication and performance. EVALLER, the web server presented in this article is based on our recently published ‘Detection based on Filtered Length-adjusted Allergen Peptides’ (DFLAP) algorithm, which affords in silico determination of potential protein allergenicity of high sensitivity and excellent specificity. To strengthen bioinformatics risk assessment in allergology EVALLER provides a comprehensive outline of its judgment on a query protein's potential allergenicity. Each such textual output incorporates a scoring figure, a confidence numeral of the assignment and information on high- or low-scoring matches to identified allergen-related motifs, including their respective location in accordingly derived allergens. The interface, built on a modified Perl Open Source package, enables dynamic and color-coded graphic representation of key parts of the output. Moreover, pertinent details can be examined in great detail through zoomed views. The server can be accessed at http://bioinformatics.bmc.uu.se/evaller.html. PMID:17537818
Data Integration Using SOAP in the VSO

NASA Astrophysics Data System (ADS)

Tian, K. Q.; Bogart, R. S.; Davey, A.; Dimitoglou, G.; Gurman, J. B.; Hill, F.; Martens, P. C.; Wampler, S.

2003-05-01

The Virtual Solar Observatory (VSO) project has implemented a time interval search for all four participating data archives. The back-end query services are implemented as web services, and are accessible via SOAP. SOAP (Simple Object Access Protocol) defines an RPC (Remote Procedure Call) mechanism that employs HTTP as its transport and encodes the client-server interactions (request and response messages) in XML (eXtensible Markup Language) documents. In addition to its core function of identifying relevant datasets in the local archive, the SOAP server at each data provider acts as a "wrapper" that maps descriptions in an abstract data model to those in the provider-specific data model, and vice versa. It is in this way that VSO integrates heterogeneous data services and allows access to them using a common interface. Our experience with SOAP has been fruitful. It has proven to be a better alternative to traditional web access methods, namely POST and GET, because of its flexibility and interoperability.
Key design elements of a data utility for national biosurveillance: event-driven architecture, caching, and Web service model.

PubMed

Tsui, Fu-Chiang; Espino, Jeremy U; Weng, Yan; Choudary, Arvinder; Su, Hoah-Der; Wagner, Michael M

2005-01-01

The National Retail Data Monitor (NRDM) has monitored over-the-counter (OTC) medication sales in the United States since December 2002. The NRDM collects data from over 18,600 retail stores and processes over 0.6 million sales records per day. This paper describes key architectural features that we have found necessary for a data utility component in a national biosurveillance system. These elements include event-driven architecture to provide analyses of data in near real time, multiple levels of caching to improve query response time, high availability through the use of clustered servers, scalable data storage through the use of storage area networks and a web-service function for interoperation with affiliated systems. The methods and architectural principles are relevant to the design of any production data utility for public health surveillance-systems that collect data from multiple sources in near real time for use by analytic programs and user interfaces that have substantial requirements for time-series data aggregated in multiple dimensions.
Persistent Identifiers for Improved Accessibility for Linked Data Querying

NASA Astrophysics Data System (ADS)

Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

2016-12-01

The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.
Ultrabroadband photonic internet: safety aspects

NASA Astrophysics Data System (ADS)

Kalicki, Arkadiusz; Romaniuk, Ryszard

2008-11-01

Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. There are several techniques to mitigate attacks. Most important are web application strong design, correct input validation, defined data types for each field and parameterized statements in SQL queries. Server hardening with firewall, modern security policies systems and safe web framework interpreter configuration are essential. It is advised to keep proper security level on client side, keep updated software and install personal web firewalls or IDS/IPS systems. Good habits are logging out from services just after finishing work and using even separate web browser for most important sites, like e-banking.
Arabidopsis Gene Family Profiler (aGFP)--user-oriented transcriptomic database with easy-to-use graphic interface.

PubMed

Dupl'áková, Nikoleta; Renák, David; Hovanec, Patrik; Honysová, Barbora; Twell, David; Honys, David

2007-07-23

Microarray technologies now belong to the standard functional genomics toolbox and have undergone massive development leading to increased genome coverage, accuracy and reliability. The number of experiments exploiting microarray technology has markedly increased in recent years. In parallel with the rapid accumulation of transcriptomic data, on-line analysis tools are being introduced to simplify their use. Global statistical data analysis methods contribute to the development of overall concepts about gene expression patterns and to query and compose working hypotheses. More recently, these applications are being supplemented with more specialized products offering visualization and specific data mining tools. We present a curated gene family-oriented gene expression database, Arabidopsis Gene Family Profiler (aGFP; http://agfp.ueb.cas.cz), which gives the user access to a large collection of normalised Affymetrix ATH1 microarray datasets. The database currently contains NASC Array and AtGenExpress transcriptomic datasets for various tissues at different developmental stages of wild type plants gathered from nearly 350 gene chips. The Arabidopsis GFP database has been designed as an easy-to-use tool for users needing an easily accessible resource for expression data of single genes, pre-defined gene families or custom gene sets, with the further possibility of keyword search. Arabidopsis Gene Family Profiler presents a user-friendly web interface using both graphic and text output. Data are stored at the MySQL server and individual queries are created in PHP script. The most distinguishable features of Arabidopsis Gene Family Profiler database are: 1) the presentation of normalized datasets (Affymetrix MAS algorithm and calculation of model-based gene-expression values based on the Perfect Match-only model); 2) the choice between two different normalization algorithms (Affymetrix MAS4 or MAS5 algorithms); 3) an intuitive interface; 4) an interactive "virtual plant" visualizing the spatial and developmental expression profiles of both gene families and individual genes. Arabidopsis GFP gives users the possibility to analyze current Arabidopsis developmental transcriptomic data starting with simple global queries that can be expanded and further refined to visualize comparative and highly selective gene expression profiles.
CruiseViewer: SIOExplorer Graphical Interface to Metadata and Archives.

NASA Astrophysics Data System (ADS)

Sutton, D. W.; Helly, J. J.; Miller, S. P.; Chase, A.; Clark, D.

2002-12-01

We are introducing "CruiseViewer" as a prototype graphical interface for the SIOExplorer digital library project, part of the overall NSF National Science Digital Library (NSDL) effort. When complete, CruiseViewer will provide access to nearly 800 cruises, as well as 100 years of documents and images from the archives of the Scripps Institution of Oceanography (SIO). The project emphasizes data object accessibility, a rich metadata format, efficient uploading methods and interoperability with other digital libraries. The primary function of CruiseViewer is to provide a human interface to the metadata database and to storage systems filled with archival data. The system schema is based on the concept of an "arbitrary digital object" (ADO). Arbitrary in that if the object can be stored on a computer system then SIOExplore can manage it. Common examples are a multibeam swath bathymetry file, a .pdf cruise report, or a tar file containing all the processing scripts used on a cruise. We require a metadata file for every ADO in an ascii "metadata interchange format" (MIF), which has proven to be highly useful for operability and extensibility. Bulk ADO storage is managed using the Storage Resource Broker, SRB, data handling middleware developed at the San Diego Supercomputer Center that centralizes management and access to distributed storage devices. MIF metadata are harvested from several sources and housed in a relational (Oracle) database. For CruiseViewer, cgi scripts resident on an Apache server are the primary communication and service request handling tools. Along with the CruiseViewer java application, users can query, access and download objects via a separate method that operates through standard web browsers, http://sioexplorer.ucsd.edu. Both provide the functionability to query and view object metadata, and select and download ADOs. For the CruiseViewer application Java 2D is used to add a geo-referencing feature that allows users to select basemap images and have vector shapes representing query results mapped over the basemap in the image panel. The two methods together address a wide range of user access needs and will allow for widespread use of SIOExplorer.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.