Managing Complex Change in Clinical Study Metadata
Brandt, Cynthia A.; Gadagkar, Rohit; Rodriguez, Cesar; Nadkarni, Prakash M.
2004-01-01
In highly functional metadata-driven software, the interrelationships within the metadata become complex, and maintenance becomes challenging. We describe an approach to metadata management that uses a knowledge-base subschema to store centralized information about metadata dependencies and use cases involving specific types of metadata modification. Our system borrows ideas from production-rule systems in that some of this information is a high-level specification that is interpreted and executed dynamically by a middleware engine. Our approach is implemented in TrialDB, a generic clinical study data management system. We review approaches that have been used for metadata management in other contexts and describe the features, capabilities, and limitations of our system. PMID:15187070
Log-less metadata management on metadata server for parallel file systems.
Liao, Jianwei; Xiao, Guoqiang; Peng, Xiaoning
2014-01-01
This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.
Log-Less Metadata Management on Metadata Server for Parallel File Systems
Xiao, Guoqiang; Peng, Xiaoning
2014-01-01
This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally. PMID:24892093
NASA Astrophysics Data System (ADS)
San Gil, Inigo; White, Marshall; Melendez, Eda; Vanderbilt, Kristin
The thirty-year-old United States Long Term Ecological Research Network has developed extensive metadata to document their scientific data. Standard and interoperable metadata is a core component of the data-driven analytical solutions developed by this research network Content management systems offer an affordable solution for rapid deployment of metadata centered information management systems. We developed a customized integrative metadata management system based on the Drupal content management system technology. Building on knowledge and experience with the Sevilleta and Luquillo Long Term Ecological Research sites, we successfully deployed the first two medium-scale customized prototypes. In this paper, we describe the vision behind our Drupal based information management instances, and list the features offered through these Drupal based systems. We also outline the plans to expand the information services offered through these metadata centered management systems. We will conclude with the growing list of participants deploying similar instances.
The PDS4 Metadata Management System
NASA Astrophysics Data System (ADS)
Raugh, A. C.; Hughes, J. S.
2018-04-01
We present the key features of the Planetary Data System (PDS) PDS4 Information Model as an extendable metadata management system for planetary metadata related to data structure, analysis/interpretation, and provenance.
GraphMeta: Managing HPC Rich Metadata in Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Dong; Chen, Yong; Carns, Philip
High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
Park, Yu Rang; Kim*, Ju Han
2006-01-01
Standardized management of data elements (DEs) for Case Report Form (CRF) is crucial in Clinical Trials Information System (CTIS). Traditional CTISs utilize organization-specific definitions and storage methods for Des and CRFs. We developed metadata-based DE management system for clinical trials, Clinical and Histopathological Metadata Registry (CHMR), using international standard for metadata registry (ISO 11179) for the management of cancer clinical trials information. CHMR was evaluated in cancer clinical trials with 1625 DEs extracted from the College of American Pathologists Cancer Protocols for 20 major cancers. PMID:17238675
Syntactic and Semantic Validation without a Metadata Management System
NASA Technical Reports Server (NTRS)
Pollack, Janine; Gokey, Christopher D.; Kendig, David; Olsen, Lola; Wharton, Stephen W. (Technical Monitor)
2001-01-01
The ability to maintain quality information is essential to securing the confidence in any system for which the information serves as a data source. NASA's Global Change Master Directory (GCMD), an online Earth science data locator, holds over 9000 data set descriptions and is in a constant state of flux as metadata are created and updated on a daily basis. In such a system, the importance of maintaining the consistency and integrity of these-metadata is crucial. The GCMD has developed a metadata management system utilizing XML, controlled vocabulary, and Java technologies to ensure the metadata not only adhere to valid syntax, but also exhibit proper semantics.
Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
2011-01-01
Background Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. Results We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data. The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. Conclusions iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata. PMID:21906284
Forum Guide to Metadata: The Meaning behind Education Data. NFES 2009-805
ERIC Educational Resources Information Center
National Forum on Education Statistics, 2009
2009-01-01
The purpose of this guide is to empower people to more effectively use data as information. To accomplish this, the publication explains what metadata are; why metadata are critical to the development of sound education data systems; what components comprise a metadata system; what value metadata bring to data management and use; and how to…
High-performance metadata indexing and search in petascale data storage systems
NASA Astrophysics Data System (ADS)
Leung, A. W.; Shao, M.; Bisson, T.; Pasupathy, S.; Miller, E. L.
2008-07-01
Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators.
Managing biomedical image metadata for search and retrieval of similar images.
Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris
2011-08-01
Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
Simplified Metadata Curation via the Metadata Management Tool
NASA Astrophysics Data System (ADS)
Shum, D.; Pilone, D.
2015-12-01
The Metadata Management Tool (MMT) is the newest capability developed as part of NASA Earth Observing System Data and Information System's (EOSDIS) efforts to simplify metadata creation and improve metadata quality. The MMT was developed via an agile methodology, taking into account inputs from GCMD's science coordinators and other end-users. In its initial release, the MMT uses the Unified Metadata Model for Collections (UMM-C) to allow metadata providers to easily create and update collection records in the ISO-19115 format. Through a simplified UI experience, metadata curators can create and edit collections without full knowledge of the NASA Best Practices implementation of ISO-19115 format, while still generating compliant metadata. More experienced users are also able to access raw metadata to build more complex records as needed. In future releases, the MMT will build upon recent work done in the community to assess metadata quality and compliance with a variety of standards through application of metadata rubrics. The tool will provide users with clear guidance as to how to easily change their metadata in order to improve their quality and compliance. Through these features, the MMT allows data providers to create and maintain compliant and high quality metadata in a short amount of time.
Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses
Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu
2015-01-01
Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/. PMID:25905099
Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.
Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu
2015-01-01
Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
NASA Astrophysics Data System (ADS)
Benedict, K. K.; Scott, S.
2013-12-01
While there has been a convergence towards a limited number of standards for representing knowledge (metadata) about geospatial (and other) data objects and collections, there exist a variety of community conventions around the specific use of those standards and within specific data discovery and access systems. This combination of limited (but multiple) standards and conventions creates a challenge for system developers that aspire to participate in multiple data infrastrucutres, each of which may use a different combination of standards and conventions. While Extensible Markup Language (XML) is a shared standard for encoding most metadata, traditional direct XML transformations (XSLT) from one standard to another often result in an imperfect transfer of information due to incomplete mapping from one standard's content model to another. This paper presents the work at the University of New Mexico's Earth Data Analysis Center (EDAC) in which a unified data and metadata management system has been developed in support of the storage, discovery and access of heterogeneous data products. This system, the Geographic Storage, Transformation and Retrieval Engine (GSTORE) platform has adopted a polyglot database model in which a combination of relational and document-based databases are used to store both data and metadata, with some metadata stored in a custom XML schema designed as a superset of the requirements for multiple target metadata standards: ISO 19115-2/19139/19110/19119, FGCD CSDGM (both with and without remote sensing extensions) and Dublin Core. Metadata stored within this schema is complemented by additional service, format and publisher information that is dynamically "injected" into produced metadata documents when they are requested from the system. While mapping from the underlying common metadata schema is relatively straightforward, the generation of valid metadata within each target standard is necessary but not sufficient for integration into multiple data infrastructures, as has been demonstrated through EDAC's testing and deployment of metadata into multiple external systems: Data.Gov, the GEOSS Registry, the DataONE network, the DSpace based institutional repository at UNM and semantic mediation systems developed as part of the NASA ACCESS ELSeWEB project. Each of these systems requires valid metadata as a first step, but to make most effective use of the delivered metadata each also has a set of conventions that are specific to the system. This presentation will provide an overview of the underlying metadata management model, the processes and web services that have been developed to automatically generate metadata in a variety of standard formats and highlight some of the specific modifications made to the output metadata content to support the different conventions used by the multiple metadata integration endpoints.
Improving Access to NASA Earth Science Data through Collaborative Metadata Curation
NASA Astrophysics Data System (ADS)
Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.
2017-12-01
The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.
Studies of Big Data metadata segmentation between relational and non-relational databases
NASA Astrophysics Data System (ADS)
Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.
2015-12-01
In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.
Li, Zuofeng; Wen, Jingran; Zhang, Xiaoyan; Wu, Chunxiao; Li, Zuogao; Liu, Lei
2012-01-01
Aim to ease the secondary use of clinical data in clinical research, we introduce a metadata driven web-based clinical data management system named ClinData Express. ClinData Express is made up of two parts: 1) m-designer, a standalone software for metadata definition; 2) a web based data warehouse system for data management. With ClinData Express, what the researchers need to do is to define the metadata and data model in the m-designer. The web interface for data collection and specific database for data storage will be automatically generated. The standards used in the system and the data export modular make sure of the data reuse. The system has been tested on seven disease-data collection in Chinese and one form from dbGap. The flexibility of system makes its great potential usage in clinical research. The system is available at http://code.google.com/p/clindataexpress. PMID:23304327
Making Interoperability Easier with the NASA Metadata Management Tool
NASA Astrophysics Data System (ADS)
Shum, D.; Reese, M.; Pilone, D.; Mitchell, A. E.
2016-12-01
ISO 19115 has enabled interoperability amongst tools, yet many users find it hard to build ISO metadata for their collections because it can be large and overly flexible for their needs. The Metadata Management Tool (MMT), part of NASA's Earth Observing System Data and Information System (EOSDIS), offers users a modern, easy to use browser based tool to develop ISO compliant metadata. Through a simplified UI experience, metadata curators can create and edit collections without any understanding of the complex ISO-19115 format, while still generating compliant metadata. The MMT is also able to assess the completeness of collection level metadata by evaluating it against a variety of metadata standards. The tool provides users with clear guidance as to how to change their metadata in order to improve their quality and compliance. It is based on NASA's Unified Metadata Model for Collections (UMM-C) which is a simpler metadata model which can be cleanly mapped to ISO 19115. This allows metadata authors and curators to meet ISO compliance requirements faster and more accurately. The MMT and UMM-C have been developed in an agile fashion, with recurring end user tests and reviews to continually refine the tool, the model and the ISO mappings. This process is allowing for continual improvement and evolution to meet the community's needs.
NASA Astrophysics Data System (ADS)
Lugmayr, Artur R.; Mailaparampil, Anurag; Tico, Florina; Kalli, Seppo; Creutzburg, Reiner
2003-01-01
Digital television (digiTV) is an additional multimedia environment, where metadata is one key element for the description of arbitrary content. This implies adequate structures for content description, which is provided by XML metadata schemes (e.g. MPEG-7, MPEG-21). Content and metadata management is the task of a multimedia repository, from which digiTV clients - equipped with an Internet connection - can access rich additional multimedia types over an "All-HTTP" protocol layer. Within this research work, we focus on conceptual design issues of a metadata repository for the storage of metadata, accessible from the feedback channel of a local set-top box. Our concept describes the whole heterogeneous life-cycle chain of XML metadata from the service provider to the digiTV equipment, device independent representation of content, accessing and querying the metadata repository, management of metadata related to digiTV, and interconnection of basic system components (http front-end, relational database system, and servlet container). We present our conceptual test configuration of a metadata repository that is aimed at a real-world deployment, done within the scope of the future interaction (fiTV) project at the Digital Media Institute (DMI) Tampere (www.futureinteraction.tv).
Scalable PGAS Metadata Management on Extreme Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Agarwal, Khushbu; Straatsma, TP
Programming models intended to run on exascale systems have a number of challenges to overcome, specially the sheer size of the system as measured by the number of concurrent software entities created and managed by the underlying runtime. It is clear from the size of these systems that any state maintained by the programming model has to be strictly sub-linear in size, in order not to overwhelm memory usage with pure overhead. A principal feature of Partitioned Global Address Space (PGAS) models is providing easy access to global-view distributed data structures. In order to provide efficient access to these distributedmore » data structures, PGAS models must keep track of metadata such as where array sections are located with respect to processes/threads running on the HPC system. As PGAS models and applications become ubiquitous on very large transpetascale systems, a key component to their performance and scalability will be efficient and judicious use of memory for model overhead (metadata) compared to application data. We present an evaluation of several strategies to manage PGAS metadata that exhibit different space/time tradeoffs. We use two real-world PGAS applications to capture metadata usage patterns and gain insight into their communication behavior.« less
Metadata Creation, Management and Search System for your Scientific Data
NASA Astrophysics Data System (ADS)
Devarakonda, R.; Palanisamy, G.
2012-12-01
Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);
Federal Data Repository Research: Recent Developments in Mercury Search System Architecture
NASA Astrophysics Data System (ADS)
Devarakonda, R.
2015-12-01
New data intensive project initiatives needs new generation data system architecture. This presentation will discuss the recent developments in Mercury System [1] including adoption, challenges, and future efforts to handle such data intensive projects. Mercury is a combination of three main tools (i) Data/Metadata registration Tool (Online Metadata Editor): The new Online Metadata Editor (OME) is a web-based tool to help document the scientific data in a well-structured, popular scientific metadata formats. (ii) Search and Visualization Tool: Provides a single portal to information contained in disparate data management systems. It facilitates distributed metadata management, data discovery, and various visuzalization capabilities. (iii) Data Citation Tool: In collaboration with Department of Energy's Oak Ridge National Laboratory (ORNL) Mercury Consortium (funded by NASA, USGS and DOE), established a Digital Object Identifier (DOI) service. Mercury is a open source system, developed and managed at Oak Ridge National Laboratory and is currently being funded by three federal agencies, including NASA, USGS and DOE. It provides access to millions of bio-geo-chemical and ecological data; 30,000 scientists use it each month. Some recent data intensive projects that are using Mercury tool: USGS Science Data Catalog (http://data.usgs.gov/), Next-Generation Ecosystem Experiments (http://ngee-arctic.ornl.gov/), Carbon Dioxide Information Analysis Center (http://cdiac.ornl.gov/), Oak Ridge National Laboratory - Distributed Active Archive Center (http://daac.ornl.gov), SoilSCAPE (http://mercury.ornl.gov/soilscape). References: [1] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94.
Evaluating non-relational storage technology for HEP metadata and meta-data catalog
NASA Astrophysics Data System (ADS)
Grigorieva, M. A.; Golosova, M. V.; Gubin, M. Y.; Klimentov, A. A.; Osipova, V. V.; Ryabinkin, E. A.
2016-10-01
Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management
ERIC Educational Resources Information Center
Southwick, Silvia B.; Lampert, Cory
2011-01-01
This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal.
Baker, Ed
2013-01-01
Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping.
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal
2013-01-01
Abstract Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping. PMID:24723768
Operational Interoperability Challenges on the Example of GEOSS and WIS
NASA Astrophysics Data System (ADS)
Heene, M.; Buesselberg, T.; Schroeder, D.; Brotzer, A.; Nativi, S.
2015-12-01
The following poster highlights the operational interoperability challenges on the example of Global Earth Observation System of Systems (GEOSS) and World Meteorological Organization Information System (WIS). At the heart of both systems is a catalogue of earth observation data, products and services but with different metadata management concepts. While in WIS a strong governance with an own metadata profile for the hundreds of thousands metadata records exists, GEOSS adopted a more open approach for the ten million records. Furthermore, the development of WIS - as an operational system - follows a roadmap with committed downwards compatibility while the GEOSS development process is more agile. The poster discusses how the interoperability can be reached for the different metadata management concepts and how a proxy concept helps to couple two different systems which follow a different development methodology. Furthermore, the poster highlights the importance of monitoring and backup concepts as a verification method for operational interoperability.
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.
In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity.
Metadata Means Communication: The Challenges of Producing Useful Metadata
NASA Astrophysics Data System (ADS)
Edwards, P. N.; Batcheller, A. L.
2010-12-01
Metadata are increasingly perceived as an important component of data sharing systems. For instance, metadata accompanying atmospheric model output may indicate the grid size, grid type, and parameter settings used in the model configuration. We conducted a case study of a data portal in the atmospheric sciences using in-depth interviews, document review, and observation. OUr analysis revealed a number of challenges in producing useful metadata. First, creating and managing metadata required considerable effort and expertise, yet responsibility for these tasks was ill-defined and diffused among many individuals, leading to errors, failure to capture metadata, and uncertainty about the quality of the primary data. Second, metadata ended up stored in many different forms and software tools, making it hard to manage versions and transfer between formats. Third, the exact meanings of metadata categories remained unsettled and misunderstood even among a small community of domain experts -- an effect we expect to be exacerbated when scientists from other disciplines wish to use these data. In practice, we found that metadata problems due to these obstacles are often overcome through informal, personal communication, such as conversations or email. We conclude that metadata serve to communicate the context of data production from the people who produce data to those who wish to use it. Thus while formal metadata systems are often public, critical elements of metadata (those embodied in informal communication) may never be recorded. Therefore, efforts to increase data sharing should include ways to facilitate inter-investigator communication. Instead of tackling metadata challenges only on the formal level, we can improve data usability for broader communities by better supporting metadata communication.
NASA Astrophysics Data System (ADS)
Chan, S.; Lehnert, K. A.; Coleman, R. J.
2011-12-01
SESAR, the System for Earth Sample Registration, is an online registry for physical samples collected for Earth and environmental studies. SESAR generates and administers the International Geo Sample Number IGSN, a unique identifier for samples that is dramatically advancing interoperability amongst information systems for sample-based data. SESAR was developed to provide the complete range of registry services, including definition of IGSN syntax and metadata profiles, registration and validation of name spaces requested by users, tools for users to submit and manage sample metadata, validation of submitted metadata, generation and validation of the unique identifiers, archiving of sample metadata, and public or private access to the sample metadata catalog. With the development of SESAR v3, we placed particular emphasis on creating enhanced tools that make metadata submission easier and more efficient for users, and that provide superior functionality for users to manage metadata of their samples in their private workspace MySESAR. For example, SESAR v3 includes a module where users can generate custom spreadsheet templates to enter metadata for their samples, then upload these templates online for sample registration. Once the content of the template is uploaded, it is displayed online in an editable grid format. Validation rules are executed in real-time on the grid data to ensure data integrity. Other new features of SESAR v3 include the capability to transfer ownership of samples to other SESAR users, the ability to upload and store images and other files in a sample metadata profile, and the tracking of changes to sample metadata profiles. In the next version of SESAR (v3.5), we will further improve the discovery, sharing, registration of samples. For example, we are developing a more comprehensive suite of web services that will allow discovery and registration access to SESAR from external systems. Both batch and individual registrations will be possible through web services. Based on valuable feedback from the user community, we will introduce enhancements that add greater flexibility to the system to accommodate the vast diversity of metadata that users want to store. Users will be able to create custom metadata fields and use these for the samples they register. Users will also be able to group samples into 'collections' to make retrieval for research projects or publications easier. An improved interface design will allow for better workflow transition and navigation throughout the application. In keeping up with the demands of a growing community, SESAR has also made process changes to ensure efficiency in system development. For example, we have implemented a release cycle to better track enhancements and fixes to the system, and an API library that facilitates reusability of code. Usage tracking, metrics and surveys capture information to guide the direction of future developments. A new set of administrative tools allows greater control of system management.
Omics Metadata Management Software v. 1 (OMMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
Using RDF and Git to Realize a Collaborative Metadata Repository.
Stöhr, Mark R; Majeed, Raphael W; Günther, Andreas
2018-01-01
The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. The participating study sites' register data differs in terms of software and coding system as well as data field coverage. To perform meaningful consortium-wide queries through one single interface, a uniform conceptual structure is required covering the DZL common data elements. No single existing terminology includes all our concepts. Potential candidates such as LOINC and SNOMED only cover specific subject areas or are not granular enough for our needs. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management. The DZL data management group formulated detailed requirements regarding the metadata repository and the user interfaces for metadata editing. Our solution builds upon existing standard technologies allowing us to meet those requirements. Its key parts are RDF and the distributed version control system Git. We developed a software system to publish updated metadata automatically and immediately after performing validation tests for completeness and consistency.
ERIC Educational Resources Information Center
White, Hollie C.
2012-01-01
Background: According to Salo (2010), the metadata entered into repositories are "disorganized" and metadata schemes underlying repositories are "arcane". This creates a challenging repository environment in regards to personal information management (PIM) and knowledge organization systems (KOSs). This dissertation research is…
The New Online Metadata Editor for Generating Structured Metadata
NASA Astrophysics Data System (ADS)
Devarakonda, R.; Shrestha, B.; Palanisamy, G.; Hook, L.; Killeffer, T.; Boden, T.; Cook, R. B.; Zolly, L.; Hutchison, V.; Frame, M. T.; Cialella, A. T.; Lazer, K.
2014-12-01
Nobody is better suited to "describe" data than the scientist who created it. This "description" about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [1] [2]. Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. How is OME helping Big Data Centers like ORNL DAAC? The ORNL DAAC is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers managed by the ESDIS Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability.References:[1] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [2] Wilson, Bruce E., et al. "Mercury Toolset for Spatiotemporal Metadata." NASA Technical Reports Server (NTRS) (2010).
A statistical metadata model for clinical trials' data management.
Vardaki, Maria; Papageorgiou, Haralambos; Pentaris, Fragkiskos
2009-08-01
We introduce a statistical, process-oriented metadata model to describe the process of medical research data collection, management, results analysis and dissemination. Our approach explicitly provides a structure for pieces of information used in Clinical Study Data Management Systems, enabling a more active role for any associated metadata. Using the object-oriented paradigm, we describe the classes of our model that participate during the design of a clinical trial and the subsequent collection and management of the relevant data. The advantage of our approach is that we focus on presenting the structural inter-relation of these classes when used during datasets manipulation by proposing certain transformations that model the simultaneous processing of both data and metadata. Our solution reduces the possibility of human errors and allows for the tracking of all changes made during datasets lifecycle. The explicit modeling of processing steps improves data quality and assists in the problem of handling data collected in different clinical trials. The case study illustrates the applicability of the proposed framework demonstrating conceptually the simultaneous handling of datasets collected during two randomized clinical studies. Finally, we provide the main considerations for implementing the proposed framework into a modern Metadata-enabled Information System.
Metadata Repository for Improved Data Sharing and Reuse Based on HL7 FHIR.
Ulrich, Hannes; Kock, Ann-Kristin; Duhm-Harbeck, Petra; Habermann, Jens K; Ingenerf, Josef
2016-01-01
Unreconciled data structures and formats are a common obstacle to the urgently required sharing and reuse of data within healthcare and medical research. Within the North German Tumor Bank of Colorectal Cancer, clinical and sample data, based on a harmonized data set, is collected and can be pooled by using a hospital-integrated Research Data Management System supporting biobank and study management. Adding further partners who are not using the core data set requires manual adaptations and mapping of data elements. Facing this manual intervention and focusing the reuse of heterogeneous healthcare instance data (value level) and data elements (metadata level), a metadata repository has been developed. The metadata repository is an ISO 11179-3 conformant server application built for annotating and mediating data elements. The implemented architecture includes the translation of metadata information about data elements into the FHIR standard using the FHIR Data Element resource with the ISO 11179 Data Element Extensions. The FHIR-based processing allows exchange of data elements with clinical and research IT systems as well as with other metadata systems. With increasingly annotated and harmonized data elements, data quality and integration can be improved for successfully enabling data analytics and decision support.
Design and Implementation of a Metadata-rich File System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ames, S; Gokhale, M B; Maltzahn, C
2010-01-19
Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less
Master Metadata Repository and Metadata-Management System
NASA Technical Reports Server (NTRS)
Armstrong, Edward; Reed, Nate; Zhang, Wen
2007-01-01
A master metadata repository (MMR) software system manages the storage and searching of metadata pertaining to data from national and international satellite sources of the Global Ocean Data Assimilation Experiment (GODAE) High Resolution Sea Surface Temperature Pilot Project [GHRSSTPP]. These sources produce a total of hundreds of data files daily, each file classified as one of more than ten data products representing global sea-surface temperatures. The MMR is a relational database wherein the metadata are divided into granulelevel records [denoted file records (FRs)] for individual satellite files and collection-level records [denoted data set descriptions (DSDs)] that describe metadata common to all the files from a specific data product. FRs and DSDs adhere to the NASA Directory Interchange Format (DIF). The FRs and DSDs are contained in separate subdatabases linked by a common field. The MMR is configured in MySQL database software with custom Practical Extraction and Reporting Language (PERL) programs to validate and ingest the metadata records. The database contents are converted into the Federal Geographic Data Committee (FGDC) standard format by use of the Extensible Markup Language (XML). A Web interface enables users to search for availability of data from all sources.
Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine
2018-04-12
The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses some of the major data and metadata management challenges that exist across the diverse missions of the bureaus and offices. All employees who create, modify, or use data are involved with data and metadata management. Identifying, establishing, and formalizing the roles and responsibilities associated with metadata management are key to institutionalizing a framework of best practices, methodologies, processes, and common approaches throughout all levels of the organization; these are the foundation for effective data resource management. For executives and managers, metadata management strengthens their overarching views of data assets, holdings, and data interoperability; and clarifies how metadata management can help accelerate the compliance of multiple policy mandates. For employees, data stewards, and data professionals, formalized metadata management will help with the consistency of definitions, and approaches addressing data discoverability, data quality, and data lineage. In addition to data professionals and others associated with information technology; data stewards and program subject matter experts take on important metadata management roles and responsibilities as data flow through their respective business and science-related workflows. The responsibilities of establishing, practicing, and governing the actions associated with their specific metadata management roles are critical to successful metadata implementation.
[Radiological dose and metadata management].
Walz, M; Kolodziej, M; Madsack, B
2016-12-01
This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented.
The role of metadata in managing large environmental science datasets. Proceedings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Melton, R.B.; DeVaney, D.M.; French, J. C.
1995-06-01
The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.
Dataworks for GNSS: Software for Supporting Data Sharing and Federation of Geodetic Networks
NASA Astrophysics Data System (ADS)
Boler, F. M.; Meertens, C. M.; Miller, M. M.; Wier, S.; Rost, M.; Matykiewicz, J.
2015-12-01
Continuously-operating Global Navigation Satellite System (GNSS) networks are increasingly being installed globally for a wide variety of science and societal applications. GNSS enables Earth science research in areas including tectonic plate interactions, crustal deformation in response to loading by tectonics, magmatism, water and ice, and the dynamics of water - and thereby energy transfer - in the atmosphere at regional scale. The many individual scientists and organizations that set up GNSS stations globally are often open to sharing data, but lack the resources or expertise to deploy systems and software to manage and curate data and metadata and provide user tools that would support data sharing. UNAVCO previously gained experience in facilitating data sharing through the NASA-supported development of the Geodesy Seamless Archive Centers (GSAC) open source software. GSAC provides web interfaces and simple web services for data and metadata discovery and access, supports federation of multiple data centers, and simplifies transfer of data and metadata to long-term archives. The NSF supported the dissemination of GSAC to multiple European data centers forming the European Plate Observing System. To expand upon GSAC to provide end-to-end, instrument-to-distribution capability, UNAVCO developed Dataworks for GNSS with NSF funding to the COCONet project, and deployed this software on systems that are now operating as Regional GNSS Data Centers as part of the NSF-funded TLALOCNet and COCONet projects. Dataworks consists of software modules written in Python and Java for data acquisition, management and sharing. There are modules for GNSS receiver control and data download, a database schema for metadata, tools for metadata handling, ingest software to manage file metadata, data file management scripts, GSAC, scripts for mirroring station data and metadata from partner GSACs, and extensive software and operator documentation. UNAVCO plans to provide a cloud VM image of Dataworks that would allow standing up a Dataworks-enabled GNSS data center without requiring upfront investment in server hardware. By enabling data creators to organize their data and metadata for sharing, Dataworks helps scientists expand their data curation awareness and responsibility, and enhances data access for all.
2016-01-01
Moving Target Indicator, Unmanned Aircraft System (UAS), Rivet Joint, U-2, and ground signals intelligence (PROPHET). At the BCT, Ranger Regiment and... metadata catalog managed by the DIB management office (outside of the DCGS-A system ). A metadata is a searchable description of data, and users across...challenge for users . The system required reboots about every 20 hours for users who had heavy workloads such as the fire support analysts and data
Distributed metadata in a high performance computing environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, John M.; Faibish, Sorin; Zhang, Zhenhua
A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination thatmore » a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.« less
A multi-service data management platform for scientific oceanographic products
NASA Astrophysics Data System (ADS)
D'Anca, Alessandro; Conte, Laura; Nassisi, Paola; Palazzo, Cosimo; Lecci, Rita; Cretì, Sergio; Mancini, Marco; Nuzzo, Alessandra; Mirto, Maria; Mannarini, Gianandrea; Coppini, Giovanni; Fiore, Sandro; Aloisio, Giovanni
2017-02-01
An efficient, secure and interoperable data platform solution has been developed in the TESSA project to provide fast navigation and access to the data stored in the data archive, as well as a standard-based metadata management support. The platform mainly targets scientific users and the situational sea awareness high-level services such as the decision support systems (DSS). These datasets are accessible through the following three main components: the Data Access Service (DAS), the Metadata Service and the Complex Data Analysis Module (CDAM). The DAS allows access to data stored in the archive by providing interfaces for different protocols and services for downloading, variables selection, data subsetting or map generation. Metadata Service is the heart of the information system of the TESSA products and completes the overall infrastructure for data and metadata management. This component enables data search and discovery and addresses interoperability by exploiting widely adopted standards for geospatial data. Finally, the CDAM represents the back-end of the TESSA DSS by performing on-demand complex data analysis tasks.
NASA Astrophysics Data System (ADS)
Sheldon, W.
2013-12-01
Managing data for a large, multidisciplinary research program such as a Long Term Ecological Research (LTER) site is a significant challenge, but also presents unique opportunities for data stewardship. LTER research is conducted within multiple organizational frameworks (i.e. a specific LTER site as well as the broader LTER network), and addresses both specific goals defined in an NSF proposal as well as broader goals of the network; therefore, every LTER data can be linked to rich contextual information to guide interpretation and comparison. The challenge is how to link the data to this wealth of contextual metadata. At the Georgia Coastal Ecosystems LTER we developed an integrated information management system (GCE-IMS) to manage, archive and distribute data, metadata and other research products as well as manage project logistics, administration and governance (figure 1). This system allows us to store all project information in one place, and provide dynamic links through web applications and services to ensure content is always up to date on the web as well as in data set metadata. The database model supports tracking changes over time in personnel roles, projects and governance decisions, allowing these databases to serve as canonical sources of project history. Storing project information in a central database has also allowed us to standardize both the formatting and content of critical project information, including personnel names, roles, keywords, place names, attribute names, units, and instrumentation, providing consistency and improving data and metadata comparability. Lookup services for these standard terms also simplify data entry in web and database interfaces. We have also coupled the GCE-IMS to our MATLAB- and Python-based data processing tools (i.e. through database connections) to automate metadata generation and packaging of tabular and GIS data products for distribution. Data processing history is automatically tracked throughout the data lifecycle, from initial import through quality control, revision and integration by our data processing system (GCE Data Toolbox for MATLAB), and included in metadata for versioned data products. This high level of automation and system integration has proven very effective in managing the chaos and scalability of our information management program.
Life+ EnvEurope DEIMS - improving access to long-term ecosystem monitoring data in Europe
NASA Astrophysics Data System (ADS)
Kliment, Tomas; Peterseil, Johannes; Oggioni, Alessandro; Pugnetti, Alessandra; Blankman, David
2013-04-01
Long-term ecological (LTER) studies aim at detecting environmental changes and analysing its related drivers. In this respect LTER Europe provides a network of about 450 sites and platforms. However, data on various types of ecosystems and at a broad geographical scale is still not easily available. Managing data resulting from long-term observations is therefore one of the important tasks not only for an LTER site itself but also on the network level. Exchanging and sharing the information within a wider community is a crucial objective in the upcoming years. Due to the fragmented nature of long-term ecological research and monitoring (LTER) in Europe - and also on the global scale - information management has to face several challenges: distributed data sources, heterogeneous data models, heterogeneous data management solutions and the complex domain of ecosystem monitoring with regard to the resulting data. The Life+ EnvEurope project (2010-2013) provides a case study for a workflow using data from the distributed network of LTER-Europe sites. In order to enhance discovery, evaluation and access to data, the EnvEurope Drupal Ecological Information Management System (DEIMS) has been developed. This is based on the first official release of the Drupal metadata editor developed by US LTER. EnvEurope DEIMS consists of three main components: 1) Metadata editor: a web-based client interface to manage metadata of three information resource types - datasets, persons and research sites. A metadata model describing datasets based on Ecological Metadata Language (EML) was developed within the initial phase of the project. A crosswalk to the INSPIRE metadata model was implemented to convey to the currently on-going European activities. Person and research site metadata models defined within the LTER Europe were adapted for the project needs. The three metadata models are interconnected within the system in order to provide easy way to navigate the user among the related resources. 2) Discovery client: provides several search profiles for datasets, persons, research sites and external resources commonly used in the domain, e.g. Catalogue of Life , based on several search patterns ranging from simple full text search, glossary browsing to categorized faceted search. 3) Geo-Viewer: a map client that portrays boundaries and centroids of the research sites as Web Map Service (WMS) layers. Each layer provides a link to both Metadata editor and Discovery client in order to create or discover metadata describing the data collected within the individual research site. Sharing of the dataset metadata with DEIMS is ensured in two ways: XML export of individual metadata records according to the EML schema for inclusion in the international DataOne network, and periodic harvesting of metadata into GeoNetwork catalogue, thus providing catalogue service for web (CSW), which can be invoked by remote clients. The final version of DEIMS will be a pilot implementation for the information system of LTER-Europe, which should establish a common information management framework within the European ecosystem research domain and provide valuable environmental information to other European information infrastructures as SEIS, Copernicus and INSPIRE.
NASA Astrophysics Data System (ADS)
Oggioni, Alessandro; Tagliolato, Paolo; Fugazza, Cristiano; Bastianini, Mauro; Pavesi, Fabio; Pepe, Monica; Menegon, Stefano; Basoni, Anna; Carrara, Paola
2015-04-01
Sensor observation systems for environmental data have become increasingly important in the last years. The EGU's Informatics in Oceanography and Ocean Science track stressed the importance of management tools and solutions for marine infrastructures. We think that full interoperability among sensor systems is still an open issue and that the solution to this involves providing appropriate metadata. Several open source applications implement the SWE specification and, particularly, the Sensor Observation Services (SOS) standard. These applications allow for the exchange of data and metadata in XML format between computer systems. However, there is a lack of metadata editing tools supporting end users in this activity. Generally speaking, it is hard for users to provide sensor metadata in the SensorML format without dedicated tools. In particular, such a tool should ease metadata editing by providing, for standard sensors, all the invariant information to be included in sensor metadata, thus allowing the user to concentrate on the metadata items that are related to the specific deployment. RITMARE, the Italian flagship project on marine research, envisages a subproject, SP7, for the set-up of the project's spatial data infrastructure. SP7 developed EDI, a general purpose, template-driven metadata editor that is composed of a backend web service and an HTML5/javascript client. EDI can be customized for managing the creation of generic metadata encoded as XML. Once tailored to a specific metadata format, EDI presents the users a web form with advanced auto completion and validation capabilities. In the case of sensor metadata (SensorML versions 1.0.1 and 2.0), the EDI client is instructed to send an "insert sensor" request to an SOS endpoint in order to save the metadata in an SOS server. In the first phase of project RITMARE, EDI has been used to simplify the creation from scratch of SensorML metadata by the involved researchers and data managers. An interesting by-product of this ongoing work is currently constituting an archive of predefined sensor descriptions. This information is being collected in order to further ease metadata creation in the next phase of the project. Users will be able to choose among a number of sensor and sensor platform prototypes: These will be specific instances on which it will be possible to define, in a bottom-up approach, "sensor profiles". We report on the outcome of this activity.
Mercury- Distributed Metadata Management, Data Discovery and Access System
NASA Astrophysics Data System (ADS)
Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.
2007-12-01
Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
ISO 19115 Experiences in NASA's Earth Observing System (EOS) ClearingHOuse (ECHO)
NASA Astrophysics Data System (ADS)
Cechini, M. F.; Mitchell, A.
2011-12-01
Metadata is an important entity in the process of cataloging, discovering, and describing earth science data. As science research and the gathered data increases in complexity, so does the complexity and importance of descriptive metadata. To meet these growing needs, the metadata models required utilize richer and more mature metadata attributes. Categorizing, standardizing, and promulgating these metadata models to a politically, geographically, and scientifically diverse community is a difficult process. An integral component of metadata management within NASA's Earth Observing System Data and Information System (EOSDIS) is the Earth Observing System (EOS) ClearingHOuse (ECHO). ECHO is the core metadata repository for the EOSDIS data centers providing a centralized mechanism for metadata and data discovery and retrieval. ECHO has undertaken an internal restructuring to meet the changing needs of scientists, the consistent advancement in technology, and the advent of new standards such as ISO 19115. These improvements were based on the following tenets for data discovery and retrieval: + There exists a set of 'core' metadata fields recommended for data discovery. + There exists a set of users who will require the entire metadata record for advanced analysis. + There exists a set of users who will require a 'core' set metadata fields for discovery only. + There will never be a cessation of new formats or a total retirement of all old formats. + Users should be presented metadata in a consistent format of their choosing. In order to address the previously listed items, ECHO's new metadata processing paradigm utilizes the following approach: + Identify a cross-format set of 'core' metadata fields necessary for discovery. + Implement format-specific indexers to extract the 'core' metadata fields into an optimized query capability. + Archive the original metadata in its entirety for presentation to users requiring the full record. + Provide on-demand translation of 'core' metadata to any supported result format. Lessons learned by the ECHO team while implementing its new metadata approach to support usage of the ISO 19115 standard will be presented. These lessons learned highlight some discovered strengths and weaknesses in the ISO 19115 standard as it is introduced to an existing metadata processing system.
ERIC Educational Resources Information Center
O'Neill, Edward T.; Lavoie, Brian F.; Bennett, Rick; Staples, Thornton; Wayland, Ross; Payette, Sandra; Dekkers, Makx; Weibel, Stuart; Searle, Sam; Thompson, Dave; Rudner, Lawrence M.
2003-01-01
Includes five articles that examine key trends in the development of the public Web: size and growth, internationalization, and metadata usage; Flexible Extensible Digital Object and Repository Architecture (Fedora) for use in digital libraries; developments in the Dublin Core Metadata Initiative (DCMI); the National Library of New Zealand Te Puna…
ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability
NASA Astrophysics Data System (ADS)
Trapasso, T. J.
2017-12-01
With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).
Earth Science Data Grid System
NASA Astrophysics Data System (ADS)
Chi, Y.; Yang, R.; Kafatos, M.
2004-12-01
The Earth Science Data Grid System (ESDGS) is a software in support of earth science data storage and access. It is built upon the Storage Resource Broker (SRB) data grid technology. We have developed a complete data grid system consistent of SRB server providing users uniform access to diverse storage resources in a heterogeneous computing environment and metadata catalog server (MCAT) managing the metadata associated with data set, users, and resources. We are also developing additional services of 1) metadata management, 2) geospatial, temporal, and content-based indexing, and 3) near/on site data processing, in response to the unique needs of Earth science applications. In this paper, we will describe the software architecture and components of the system, and use a practical example in support of storage and access of rainfall data from the Tropical Rainfall Measuring Mission (TRMM) to illustrate its functionality and features.
Transformation of HDF-EOS metadata from the ECS model to ISO 19115-based XML
NASA Astrophysics Data System (ADS)
Wei, Yaxing; Di, Liping; Zhao, Baohua; Liao, Guangxuan; Chen, Aijun
2007-02-01
Nowadays, geographic data, such as NASA's Earth Observation System (EOS) data, are playing an increasing role in many areas, including academic research, government decisions and even in people's every lives. As the quantity of geographic data becomes increasingly large, a major problem is how to fully make use of such data in a distributed, heterogeneous network environment. In order for a user to effectively discover and retrieve the specific information that is useful, the geographic metadata should be described and managed properly. Fortunately, the emergence of XML and Web Services technologies greatly promotes information distribution across the Internet. The research effort discussed in this paper presents a method and its implementation for transforming Hierarchical Data Format (HDF)-EOS metadata from the NASA ECS model to ISO 19115-based XML, which will be managed by the Open Geospatial Consortium (OGC) Catalogue Services—Web Profile (CSW). Using XML and international standards rather than domain-specific models to describe the metadata of those HDF-EOS data, and further using CSW to manage the metadata, can allow metadata information to be searched and interchanged more widely and easily, thus promoting the sharing of HDF-EOS data.
Automated Test Methods for XML Metadata
2017-12-28
Group under RCC Task TG-147. This document (Volume VI of the RCC Document 118 series) describes procedures used for evaluating XML metadata documents...including TMATS, MDL, IHAL, and DDML documents. These documents contain specifications or descriptions of artifacts and systems of importance to...the collection and management of telemetry data. The methods defined in this report provide a means of evaluating the suitability of such a metadata
NASA Astrophysics Data System (ADS)
Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan
2010-10-01
The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata
NASA Astrophysics Data System (ADS)
Goodrum, Abby; Howison, James
2005-01-01
This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata
NASA Astrophysics Data System (ADS)
Goodrum, Abby; Howison, James
2004-12-01
This paper considers the deceptively simple question: Why can"t digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Metadata management for high content screening in OMERO
Li, Simon; Besson, Sébastien; Blackburn, Colin; Carroll, Mark; Ferguson, Richard K.; Flynn, Helen; Gillen, Kenneth; Leigh, Roger; Lindner, Dominik; Linkert, Melissa; Moore, William J.; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Allan, Chris; Burel, Jean-Marie; Moore, Josh; Swedlow, Jason R.
2016-01-01
High content screening (HCS) experiments create a classic data management challenge—multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of “final” results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org. PMID:26476368
Metadata management for high content screening in OMERO.
Li, Simon; Besson, Sébastien; Blackburn, Colin; Carroll, Mark; Ferguson, Richard K; Flynn, Helen; Gillen, Kenneth; Leigh, Roger; Lindner, Dominik; Linkert, Melissa; Moore, William J; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Allan, Chris; Burel, Jean-Marie; Moore, Josh; Swedlow, Jason R
2016-03-01
High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Management system for the SND experiments
NASA Astrophysics Data System (ADS)
Pugachev, K.; Korol, A.
2017-09-01
A new management system for the SND detector experiments (at VEPP-2000 collider in Novosibirsk) is developed. We describe here the interaction between a user and the SND databases. These databases contain experiment configuration, conditions and metadata. The new system is designed in client-server architecture. It has several logical layers corresponding to the users roles. A new template engine is created. A web application is implemented using Node.js framework. At the time the application provides: showing and editing configuration; showing experiment metadata and experiment conditions data index; showing SND log (prototype).
Experiment Management System for the SND Detector
NASA Astrophysics Data System (ADS)
Pugachev, K.
2017-10-01
We present a new experiment management system for the SND detector at the VEPP-2000 collider (Novosibirsk). An important part to report about is access to experimental databases (configuration, conditions and metadata). The system is designed in client-server architecture. User interaction comes true using web-interface. The server side includes several logical layers: user interface templates; template variables description and initialization; implementation details. The templates are meant to involve as less IT knowledge as possible. Experiment configuration, conditions and metadata are stored in a database. To implement the server side Node.js, a modern JavaScript framework, has been chosen. A new template engine having an interesting feature is designed. A part of the system is put into production. It includes templates dealing with showing and editing first level trigger configuration and equipment configuration and also showing experiment metadata and experiment conditions data index.
Dynamic Non-Hierarchical File Systems for Exascale Storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Long, Darrell E.; Miller, Ethan L
This constitutes the final report for “Dynamic Non-Hierarchical File Systems for Exascale Storage”. The ultimate goal of this project was to improve data management in scientific computing and high-end computing (HEC) applications, and to achieve this goal we proposed: to develop the first, HEC-targeted, file system featuring rich metadata and provenance collection, extreme scalability, and future storage hardware integration as core design goals, and to evaluate and develop a flexible non-hierarchical file system interface suitable for providing more powerful and intuitive data management interfaces to HEC and scientific computing users. Data management is swiftly becoming a serious problem in themore » scientific community – while copious amounts of data are good for obtaining results, finding the right data is often daunting and sometimes impossible. Scientists participating in a Department of Energy workshop noted that most of their time was spent “...finding, processing, organizing, and moving data and it’s going to get much worse”. Scientists should not be forced to become data mining experts in order to retrieve the data they want, nor should they be expected to remember the naming convention they used several years ago for a set of experiments they now wish to revisit. Ideally, locating the data you need would be as easy as browsing the web. Unfortunately, existing data management approaches are usually based on hierarchical naming, a 40 year-old technology designed to manage thousands of files, not exabytes of data. Today’s systems do not take advantage of the rich array of metadata that current high-end computing (HEC) file systems can gather, including content-based metadata and provenance1 information. As a result, current metadata search approaches are typically ad hoc and often work by providing a parallel management system to the “main” file system, as is done in Linux (the locate utility), personal computers, and enterprise search appliances. These search applications are often optimized for a single file system, making it difficult to move files and their metadata between file systems. Users have tried to solve this problem in several ways, including the use of separate databases to index file properties, the encoding of file properties into file names, and separately gathering and managing provenance data, but none of these approaches has worked well, either due to limited usefulness or scalability, or both. Our research addressed several key issues: High-performance, real-time metadata harvesting: extracting important attributes from files dynamically and immediately updating indexes used to improve search; Transparent, automatic, and secure provenance capture: recording the data inputs and processing steps used in the production of each file in the system; Scalable indexing: indexes that are optimized for integration with the file system; Dynamic file system structure: our approach provides dynamic directories similar to those in semantic file systems, but these are the native organization rather than a feature grafted onto a conventional system. In addition to these goals, our research effort will include evaluating the impact of new storage technologies on the file system design and performance. In particular, the indexing and metadata harvesting functions can potentially benefit from the performance improvements promised by new storage class memories.« less
AASG State Geothermal Data Repository for the National Geothermal Data System.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-01-01
This Drupal metadata and documents capture and management system is a repository, used for maintenance of metadata which describe resources contributed to the AASG State Geothermal Data System. The repository also provides an archive for files that are not hosted by the agency contributing the resource. Data from all 50 state geological surveys is represented here, and is contributed in turn to the National Geothermal Data System.
Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences
NASA Astrophysics Data System (ADS)
Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida
2017-04-01
The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool Protégé, which is a graphical ontology-development tool that supports a rich knowledge model and it is open-source and freely available. To process and manage the ontology, we are using Semantic MediaWiki, which is able to process queries. Semantic MediaWiki is an extension of MediaWiki where we can do semantic search and export data in RDF. Our final goal is integrating our data repository portal and semantic processing engine in order to have a complete system to manage the data life cycle stages and their relationships, including machine-actionable DMP solution, datasets and software management, computing resources for processing and analysis and publication features (DOI mint). This way we will be able to reproduce the full data life cycle chain warranting the FAIR+R principles.
The GEOSS Clearinghouse based on the GeoNetwork opensource
NASA Astrophysics Data System (ADS)
Liu, K.; Yang, C.; Wu, H.; Huang, Q.
2010-12-01
The Global Earth Observation System of Systems (GEOSS) is established to support the study of the Earth system in a global community. It provides services for social management, quick response, academic research, and education. The purpose of GEOSS is to achieve comprehensive, coordinated and sustained observations of the Earth system, improve monitoring of the state of the Earth, increase understanding of Earth processes, and enhance prediction of the behavior of the Earth system. In 2009, GEO called for a competition for an official GEOSS clearinghouse to be selected as a source to consolidating catalogs for Earth observations. The Joint Center for Intelligent Spatial Computing at George Mason University worked with USGS to submit a solution based on the open-source platform - GeoNetwork. In the spring of 2010, the solution is selected as the product for GEOSS clearinghouse. The GEOSS Clearinghouse is a common search facility for the Intergovernmental Group on Ea rth Observation (GEO). By providing a list of harvesting functions in Business Logic, GEOSS clearinghouse can collect metadata from distributed catalogs including other GeoNetwork native nodes, webDAV/sitemap/WAF, catalog services for the web (CSW)2.0, GEOSS Component and Service Registry (http://geossregistries.info/), OGC Web Services (WCS, WFS, WMS and WPS), OAI Protocol for Metadata Harvesting 2.0, ArcSDE Server and Local File System. Metadata in GEOSS clearinghouse are managed in a database (MySQL, Postgresql, Oracle, or MckoiDB) and an index of the metadata is maintained through Lucene engine. Thus, EO data, services, and related resources can be discovered and accessed. It supports a variety of geospatial standards including CSW and SRU for search, FGDC and ISO metadata, and WMS related OGC standards for data access and visualization, as linked from the metadata.
Streamlining Metadata and Data Management for Evolving Digital Libraries
NASA Astrophysics Data System (ADS)
Clark, D.; Miller, S. P.; Peckman, U.; Smith, J.; Aerni, S.; Helly, J.; Sutton, D.; Chase, A.
2003-12-01
What began two years ago as an effort to stabilize the Scripps Institution of Oceanography (SIO) data archives from more than 700 cruises going back 50 years, has now become the operational fully-searchable "SIOExplorer" digital library, complete with thousands of historic photographs, images, maps, full text documents, binary data files, and 3D visualization experiences, totaling nearly 2 terabytes of digital content. Coping with data diversity and complexity has proven to be more challenging than dealing with large volumes of digital data. SIOExplorer has been built with scalability in mind, so that the addition of new data types and entire new collections may be accomplished with ease. It is a federated system, currently interoperating with three independent data-publishing authorities, each responsible for their own quality control, metadata specifications, and content selection. The IT architecture implemented at the San Diego Supercomputer Center (SDSC) streamlines the integration of additional projects in other disciplines with a suite of metadata management and collection building tools for "arbitrary digital objects." Metadata are automatically harvested from data files into domain-specific metadata blocks, and mapped into various specification standards as needed. Metadata can be browsed and objects can be viewed onscreen or downloaded for further analysis, with automatic proprietary-hold request management.
Expanding Access to NCAR's Digital Assets: Towards a Unified Scientific Data Management System
NASA Astrophysics Data System (ADS)
Stott, D.
2016-12-01
In 2014 the National Center for Atmospheric Research (NCAR) Directorate created the Data Stewardship Engineering Team (DSET) to plan and implement the strategic vision of an integrated front door for data discovery and access across the organization, including all laboratories, the library, and UCAR Community Programs. The DSET is focused on improving the quality of users' experiences in finding and using NCAR's digital assets. This effort also supports new policies included in federal mandates, NSF requirements, and journal publication rules. An initial survey with 97 respondents identified 68 persons responsible for more than 3 petabytes of data. An inventory, using the Data Asset Framework produced by the UK Digital Curation Centre as a starting point, identified asset types that included files and metadata, publications, images, and software (visualization, analysis, model codes). User story sessions with representatives from each lab identified and ranked desired features for a unified Scientific Data Management System (SDMS). A process beginning with an organization-wide assessment of metadata by the HDF Group and followed by meetings with labs to identify key documentation concepts, culminated in the development of an NCAR metadata dialect that leverages the DataCite and ISO 19115 standards. The tasks ahead are to build out an SDMS and populate it with rich standardized metadata. Software packages have been prototyped and currently are being tested and reviewed by DSET members. Key challenges for the DSET include technical and non-technical issues. First, the status quo with regard to how assets are managed varies widely across the organization. There are differences in file format standards, technologies, and discipline-specific vocabularies. Metadata diversity is another real challenge. The types of metadata, the standards used, and the capacity to create new metadata varies across the organization. Significant effort is required to develop tools to create new standard metadata across the organization, adapt and integrate current digital assets, and establish consistent data management practices going forward. To be successful, best practices must be infused into daily activities. This poster will highlight the processes, lessons learned, and current status of the DSET effort at NCAR.
NASA Technical Reports Server (NTRS)
Carnahan, Richard S., Jr.; Corey, Stephen M.; Snow, John B.
1989-01-01
Applications of rapid prototyping and Artificial Intelligence techniques to problems associated with Space Station-era information management systems are described. In particular, the work is centered on issues related to: (1) intelligent man-machine interfaces applied to scientific data user support, and (2) the requirement that intelligent information management systems (IIMS) be able to efficiently process metadata updates concerning types of data handled. The advanced IIMS represents functional capabilities driven almost entirely by the needs of potential users. Space Station-era scientific data projected to be generated is likely to be significantly greater than data currently processed and analyzed. Information about scientific data must be presented clearly, concisely, and with support features to allow users at all levels of expertise efficient and cost-effective data access. Additionally, mechanisms for allowing more efficient IIMS metadata update processes must be addressed. The work reported covers the following IIMS design aspects: IIMS data and metadata modeling, including the automatic updating of IIMS-contained metadata, IIMS user-system interface considerations, including significant problems associated with remote access, user profiles, and on-line tutorial capabilities, and development of an IIMS query and browse facility, including the capability to deal with spatial information. A working prototype has been developed and is being enhanced.
NASA Astrophysics Data System (ADS)
Maffei, A. R.; Chandler, C. L.; Work, T.; Allen, J.; Groman, R. C.; Fox, P. A.
2009-12-01
Content Management Systems (CMSs) provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have previously designed customized schemas for their metadata. The WHOI Ocean Informatics initiative and the NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) have jointly sponsored a project to port an existing, relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal6 Content Management System. The goal was to translate all the existing database tables, input forms, website reports, and other features present in the existing system to employ Drupal CMS features. The replacement features include Drupal content types, CCK node-reference fields, themes, RDB, SPARQL, workflow, and a number of other supporting modules. Strategic use of some Drupal6 CMS features enables three separate but complementary interfaces that provide access to oceanographic research metadata via the MySQL database: 1) a Drupal6-powered front-end; 2) a standard SQL port (used to provide a Mapserver interface to the metadata and data; and 3) a SPARQL port (feeding a new faceted search capability being developed). Future plans include the creation of science ontologies, by scientist/technologist teams, that will drive semantically-enabled faceted search capabilities planned for the site. Incorporation of semantic technologies included in the future Drupal 7 core release is also anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 6 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO database for interoperability with other ecosystem databases.
Environmental Information Management For Data Discovery and Access System
NASA Astrophysics Data System (ADS)
Giriprakash, P.
2011-01-01
Mercury is a federated metadata harvesting, search and retrieval tool based on both open source software and software developed at Oak Ridge National Laboratory. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. A major new version of Mercury was developed during 2007 and released in early 2008. This new version provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, support for RSS delivery of search results, and ready customization to meet the needs of the multiple projects which use Mercury. For the end users, Mercury provides a single portal to very quickly search for data and information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow ! the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data.
SIOExplorer: Modern IT Methods and Tools for Digital Library Management
NASA Astrophysics Data System (ADS)
Sutton, D. W.; Helly, J.; Miller, S.; Chase, A.; Clarck, D.
2003-12-01
With more geoscience disciplines becoming data-driven it is increasingly important to utilize modern techniques for data, information and knowledge management. SIOExplorer is a new digital library project with 2 terabytes of oceanographic data collected over the last 50 years on 700 cruises by the Scripps Institution of Oceanography. It is built using a suite of information technology tools and methods that allow for an efficient and effective digital library management system. The library consists of a number of independent collections, each with corresponding metadata formats. The system architecture allows each collection to be built and uploaded based on a collection dependent metadata template file (MTF). This file is used to create the hierarchical structure of the collection, create metadata tables in a relational database, and to populate object metadata files and the collection as a whole. Collections are comprised of arbitrary digital objects stored at the San Diego Supercomputer Center (SDSC) High Performance Storage System (HPSS) and managed using the Storage Resource Broker (SRB), data handling middle ware developed at SDSC. SIOExplorer interoperates with other collections as a data provider through the Open Archives Initiative (OAI) protocol. The user services for SIOExplorer are accessed from CruiseViewer, a Java application served using Java Web Start from the SIOExplorer home page. CruiseViewer is an advanced tool for data discovery and access. It implements general keyword and interactive geospatial search methods for the collections. It uses a basemap to georeference search results on user selected basemaps such as global topography or crustal age. User services include metadata viewing, opening of selective mime type digital objects (such as images, documents and grid files), and downloading of objects (including the brokering of proprietary hold restrictions).
NASA Astrophysics Data System (ADS)
Servilla, M. S.; O'Brien, M.; Costa, D.
2013-12-01
Considerable ecological research performed today occurs through the analysis of data downloaded from various repositories and archives, often resulting in derived or synthetic products generated by automated workflows. These data are only meaningful for research if they are well documented by metadata, lest semantic or data type errors may occur in interpretation or processing. The Long Term Ecological Research (LTER) Network now screens all data packages entering its long-term archive to ensure that each package contains metadata that is complete, of high quality, and accurately describes the structure of its associated data entity and the data are structurally congruent to the metadata. Screening occurs prior to the upload of a data package into the Provenance Aware Synthesis Tracking Architecture (PASTA) data management system through a series of quality checks, thus preventing ambiguously or incorrectly documented data packages from entering the system. The quality checks within PASTA are designed to work specifically with the Ecological Metadata Language (EML), the metadata standard adopted by the LTER Network to describe data generated by their 26 research sites. Each quality check is codified in Java as part of the ecological community-supported Data Manager Library, which is a resource of the EML specification and used as a component of the PASTA software stack. Quality checks test for metadata quality, data integrity, or metadata-data congruence. Quality checks are further classified as either conditional or informational. Conditional checks issue a 'valid', 'warning' or 'error' response. Only an 'error' response blocks the data package from upload into PASTA. Informational checks only provide descriptive content pertaining to a particular facet of the data package. Quality checks are designed by a group of LTER information managers and reviewed by the LTER community before deploying into PASTA. A total of 32 quality checks have been deployed to date. Quality checks can be customized through a configurable template, which includes turning checks 'on' or 'off' and setting the severity of conditional checks. This feature is important to other potential users of the Data Manager Library who wish to configure its quality checks in accordance with the standards of their community. Executing the complete set of quality checks produces a report that describes the result of each check. The report is an XML document that is stored by PASTA for future reference.
Metadata Evaluation and Improvement: Evolving Analysis and Reporting
NASA Technical Reports Server (NTRS)
Habermann, Ted; Kozimor, John; Gordon, Sean
2017-01-01
ESIP Community members create and manage a large collection of environmental datasets that span multiple decades, the entire globe, and many parts of the solar system. Metadata are critical for discovering, accessing, using and understanding these data effectively and ESIP community members have successfully created large collections of metadata describing these data. As part of the White House Big Earth Data Initiative (BEDI), ESDIS has developed a suite of tools for evaluating these metadata in native dialects with respect to recommendations from many organizations. We will describe those tools and demonstrate evolving techniques for sharing results with data providers.
DAS: A Data Management System for Instrument Tests and Operations
NASA Astrophysics Data System (ADS)
Frailis, M.; Sartor, S.; Zacchei, A.; Lodi, M.; Cirami, R.; Pasian, F.; Trifoglio, M.; Bulgarelli, A.; Gianotti, F.; Franceschi, E.; Nicastro, L.; Conforti, V.; Zoli, A.; Smart, R.; Morbidelli, R.; Dadina, M.
2014-05-01
The Data Access System (DAS) is a and data management software system, providing a reusable solution for the storage of data acquired both from telescopes and auxiliary data sources during the instrument development phases and operations. It is part of the Customizable Instrument WorkStation system (CIWS-FW), a framework for the storage, processing and quick-look at the data acquired from scientific instruments. The DAS provides a data access layer mainly targeted to software applications: quick-look displays, pre-processing pipelines and scientific workflows. It is logically organized in three main components: an intuitive and compact Data Definition Language (DAS DDL) in XML format, aimed for user-defined data types; an Application Programming Interface (DAS API), automatically adding classes and methods supporting the DDL data types, and providing an object-oriented query language; a data management component, which maps the metadata of the DDL data types in a relational Data Base Management System (DBMS), and stores the data in a shared (network) file system. With the DAS DDL, developers define the data model for a particular project, specifying for each data type the metadata attributes, the data format and layout (if applicable), and named references to related or aggregated data types. Together with the DDL user-defined data types, the DAS API acts as the only interface to store, query and retrieve the metadata and data in the DAS system, providing both an abstract interface and a data model specific one in C, C++ and Python. The mapping of metadata in the back-end database is automatic and supports several relational DBMSs, including MySQL, Oracle and PostgreSQL.
A Multi-Purpose Data Dissemination Infrastructure for the Marine-Earth Observations
NASA Astrophysics Data System (ADS)
Hanafusa, Y.; Saito, H.; Kayo, M.; Suzuki, H.
2015-12-01
To open the data from a variety of observations, the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has developed a multi-purpose data dissemination infrastructure. Although many observations have been made in the earth science, all the data are not opened completely. We think data centers may provide researchers with a universal data dissemination service which can handle various kinds of observation data with little effort. For this purpose JAMSTEC Data Management Office has developed the "Information Catalog Infrastructure System (Catalog System)". This is a kind of catalog management system which can create, renew and delete catalogs (= databases) and has following features, - The Catalog System does not depend on data types or granularity of data records. - By registering a new metadata schema to the system, a new database can be created on the same system without sytem modification. - As web pages are defined by the cascading style sheets, databases have different look and feel, and operability. - The Catalog System provides databases with basic search tools; search by text, selection from a category tree, and selection from a time line chart. - For domestic users it creates the Japanese and English pages at the same time and has dictionary to control terminology and proper noun. As of August 2015 JAMSTEC operates 7 databases on the Catalog System. We expect to transfer existing databases to this system, or create new databases on it. In comparison with a dedicated database developed for the specific dataset, the Catalog System is suitable for the dissemination of small datasets, with minimum cost. Metadata held in the catalogs may be transfered to other metadata schema to exchange global databases or portals. Examples: JAMSTEC Data Catalog: http://www.godac.jamstec.go.jp/catalog/data_catalog/metadataList?lang=enJAMSTEC Document Catalog: http://www.godac.jamstec.go.jp/catalog/doc_catalog/metadataList?lang=en&tab=categoryResearch Information and Data Access Site of TEAMS: http://www.i-teams.jp/catalog/rias/metadataList?lang=en&tab=list
OSCAR/Surface: Metadata for the WMO Integrated Observing System WIGOS
NASA Astrophysics Data System (ADS)
Klausen, Jörg; Pröscholdt, Timo; Mannes, Jürg; Cappelletti, Lucia; Grüter, Estelle; Calpini, Bertrand; Zhang, Wenjian
2016-04-01
The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority underpinning all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). It does this by better integrating WMO and co-sponsored observing systems, as well as partner networks. For this, an important aspect is the description of the observational capabilities by way of structured metadata. The 17th Congress of the Word Meteorological Organization (Cg-17) has endorsed the semantic WIGOS metadata standard (WMDS) developed by the Task Team on WIGOS Metadata (TT-WMD). The standard comprises of a set of metadata classes that are considered to be of critical importance for the interpretation of observations and the evolution of observing systems relevant to WIGOS. The WMDS serves all recognized WMO Application Areas, and its use for all internationally exchanged observational data generated by WMO Members is mandatory. The standard will be introduced in three phases between 2016 and 2020. The Observing Systems Capability Analysis and Review (OSCAR) platform operated by MeteoSwiss on behalf of WMO is the official repository of WIGOS metadata and an implementation of the WMDS. OSCAR/Surface deals with all surface-based observations from land, air and oceans, combining metadata managed by a number of complementary, more domain-specific systems (e.g., GAWSIS for the Global Atmosphere Watch, JCOMMOPS for the marine domain, the WMO Radar database). It is a modern, web-based client-server application with extended information search, filtering and mapping capabilities including a fully developed management console to add and edit observational metadata. In addition, a powerful application programming interface (API) is being developed to allow machine-to-machine metadata exchange. The API is based on an ISO/OGC-compliant XML schema for the WMDS using the Observations and Measurements (ISO19156) conceptual model. The purpose of the presentation is to acquaint the audience with OSCAR, the WMDS and the current XML schema; and, to explore the relationship to the INSPIRE XML schema. Feedback from experts in the various disciplines of meteorology, climatology, atmospheric chemistry, hydrology on the utility of the new standard and the XML schema will be solicited and will guide WMO in further evolving the WMDS.
Evolving Metadata in NASA Earth Science Data Systems
NASA Astrophysics Data System (ADS)
Mitchell, A.; Cechini, M. F.; Walter, J.
2011-12-01
NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of NASA's effort to continually evolve its data systems led ECHO to enhancing the method in which it receives inventory metadata from the data centers to allow for multiple metadata formats including ISO 19115. ECHO's metadata model will also be mapped to the NASA-specific convention for ingesting science metadata into the ECHO system. As NASA's new Earth Science missions and data centers are migrating to the ISO 19115 standards, EOSDIS is developing metadata management resources to assist in the reading, writing and parsing ISO 19115 compliant metadata. To foster interoperability with other agencies and international partners, NASA is working to ensure that a common ISO 19115 convention is developed, enhancing data sharing capabilities and other data analysis initiatives. NASA is also investigating the use of ISO 19115 standards to encode data quality, lineage and provenance with stored values. A common metadata standard across NASA's Earth Science data systems promotes interoperability, enhances data utilization and removes levels of uncertainty found in data products.
Metadata Design in the New PDS4 Standards - Something for Everybody
NASA Astrophysics Data System (ADS)
Raugh, Anne C.; Hughes, John S.
2015-11-01
The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data preparers.
The STP (Solar-Terrestrial Physics) Semantic Web based on the RSS1.0 and the RDF
NASA Astrophysics Data System (ADS)
Kubo, T.; Murata, K. T.; Kimura, E.; Ishikura, S.; Shinohara, I.; Kasaba, Y.; Watari, S.; Matsuoka, D.
2006-12-01
In the Solar-Terrestrial Physics (STP), it is pointed out that circulation and utilization of observation data among researchers are insufficient. To archive interdisciplinary researches, we need to overcome this circulation and utilization problems. Under such a background, authors' group has developed a world-wide database that manages meta-data of satellite and ground-based observation data files. It is noted that retrieving meta-data from the observation data and registering them to database have been carried out by hand so far. Our goal is to establish the STP Semantic Web. The Semantic Web provides a common framework that allows a variety of data shared and reused across applications, enterprises, and communities. We also expect that the secondary information related with observations, such as event information and associated news, are also shared over the networks. The most fundamental issue on the establishment is who generates, manages and provides meta-data in the Semantic Web. We developed an automatic meta-data collection system for the observation data using the RSS (RDF Site Summary) 1.0. The RSS1.0 is one of the XML-based markup languages based on the RDF (Resource Description Framework), which is designed for syndicating news and contents of news-like sites. The RSS1.0 is used to describe the STP meta-data, such as data file name, file server address and observation date. To describe the meta-data of the STP beyond RSS1.0 vocabulary, we defined original vocabularies for the STP resources using the RDF Schema. The RDF describes technical terms on the STP along with the Dublin Core Metadata Element Set, which is standard for cross-domain information resource descriptions. Researchers' information on the STP by FOAF, which is known as an RDF/XML vocabulary, creates a machine-readable metadata describing people. Using the RSS1.0 as a meta-data distribution method, the workflow from retrieving meta-data to registering them into the database is automated. This technique is applied for several database systems, such as the DARTS database system and NICT Space Weather Report Service. The DARTS is a science database managed by ISAS/JAXA in Japan. We succeeded in generating and collecting the meta-data automatically for the CDF (Common data Format) data, such as Reimei satellite data, provided by the DARTS. We also create an RDF service for space weather report and real-time global MHD simulation 3D data provided by the NICT. Our Semantic Web system works as follows: The RSS1.0 documents generated on the data sites (ISAS and NICT) are automatically collected by a meta-data collection agent. The RDF documents are registered and the agent extracts meta-data to store them in the Sesame, which is an open source RDF database with support for RDF Schema inferencing and querying. The RDF database provides advanced retrieval processing that has considered property and relation. Finally, the STP Semantic Web provides automatic processing or high level search for the data which are not only for observation data but for space weather news, physical events, technical terms and researches information related to the STP.
Razick, Sabry; Močnik, Rok; Thomas, Laurent F.; Ryeng, Einar; Drabløs, Finn; Sætrom, Pål
2014-01-01
Systematic data management and controlled data sharing aim at increasing reproducibility, reducing redundancy in work, and providing a way to efficiently locate complementing or contradicting information. One method of achieving this is collecting data in a central repository or in a location that is part of a federated system and providing interfaces to the data. However, certain data, such as data from biobanks or clinical studies, may, for legal and privacy reasons, often not be stored in public repositories. Instead, we describe a metadata cataloguing system and a software suite for reporting the presence of data from the life sciences domain. The system stores three types of metadata: file information, file provenance and data lineage, and content descriptions. Our software suite includes both graphical and command line interfaces that allow users to report and tag files with these different metadata types. Importantly, the files remain in their original locations with their existing access-control mechanisms in place, while our system provides descriptions of their contents and relationships. Our system and software suite thereby provide a common framework for cataloguing and sharing both public and private data. Database URL: http://bigr.medisin.ntnu.no/data/eGenVar/ PMID:24682735
ENVIRONMENTAL INFORMATION MANAGEMENT SYSTEM (EIMS)
The Environmental Information Management System (EIMS) organizes descriptive information (metadata) for data sets, databases, documents, models, projects, and spatial data. The EIMS design provides a repository for scientific documentation that can be easily accessed with standar...
A Flexible Online Metadata Editing and Management System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aguilar, Raul; Pan, Jerry Yun; Gries, Corinna
2010-01-01
A metadata editing and management system is being developed employing state of the art XML technologies. A modular and distributed design was chosen for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool or schema walker used to generate code for the actual online editor, a native XML database, and an online user access management application. The design tool is a Java Swing application that reads an XML schema, provides the designer with options to combine input fields into online forms and give the fieldsmore » user friendly tags. Based on design decisions, the tool generates code for the online metadata editor. The code generated is an implementation of the XForms standard using the Orbeon Framework. The design tool fulfills two requirements: First, data entry forms based on one schema may be customized at design time and second data entry applications may be generated for any valid XML schema without relying on custom information in the schema. However, the customized information generated at design time is saved in a configuration file which may be re-used and changed again in the design tool. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services. Additional styling of the finished editor is accomplished via cascading style sheets which may be further customized and different look-and-feels may be accumulated through the community process. The customized editor produces XML files in compliance with the original schema, however, data from the current page is saved into a native XML database whenever the user moves to the next screen or pushes the save button independently of validity. Currently the system uses the open source XML database eXist for storage and management, which comes with third party online and desktop management tools. However, access to metadata files in the application introduced here is managed in a custom online module, using a MySQL backend accessed by a simple Java Server Faces front end. A flexible system with three grouping options, organization, group and single editing access is provided. Three levels were chosen to distribute administrative responsibilities and handle the common situation of an information manager entering the bulk of the metadata but leave specifics to the actual data provider.« less
Legacy2Drupal: Conversion of an existing relational oceanographic database to a Drupal 7 CMS
NASA Astrophysics Data System (ADS)
Work, T. T.; Maffei, A. R.; Chandler, C. L.; Groman, R. C.
2011-12-01
Content Management Systems (CMSs) such as Drupal provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have already designed and implemented customized schemas for their metadata. The NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) has ported an existing relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal 7 Content Management System. This is an update on an effort described as a proof-of-concept in poster IN21B-1051, presented at AGU2009. The BCO-DMO project has translated all the existing database tables, input forms, website reports, and other features present in the existing system into Drupal CMS features. The replacement features are made possible by the use of Drupal content types, CCK node-reference fields, a custom theme, and a number of other supporting modules. This presentation describes the process used to migrate content in the original BCO-DMO metadata database to Drupal 7, some problems encountered during migration, and the modules used to migrate the content successfully. Strategic use of Drupal 7 CMS features that enable three separate but complementary interfaces to provide access to oceanographic research metadata will also be covered: 1) a Drupal 7-powered user front-end; 2) REST-ful JSON web services (providing a Mapserver interface to the metadata and data; and 3) a SPARQL interface to a semantic representation of the repository metadata (this feeding a new faceted search capability currently under development). The existing BCO-DMO ontology, developed in collaboration with Rensselaer Polytechnic Institute's Tetherless World Constellation, makes strategic use of pre-existing ontologies and will be used to drive semantically-enabled faceted search capabilities planned for the site. At this point, the use of semantic technologies included in the Drupal 7 core is anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 7 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO and other science data repositories for interoperability between systems that serve ecosystem research data.
Component-Based Approach in Learning Management System Development
ERIC Educational Resources Information Center
Zaitseva, Larisa; Bule, Jekaterina; Makarov, Sergey
2013-01-01
The paper describes component-based approach (CBA) for learning management system development. Learning object as components of e-learning courses and their metadata is considered. The architecture of learning management system based on CBA being developed in Riga Technical University, namely its architecture, elements and possibilities are…
Dhaval, Rakesh; Borlawsky, Tara; Ostrander, Michael; Santangelo, Jennifer; Kamal, Jyoti; Payne, Philip R O
2008-11-06
In order to enhance interoperability between enterprise systems, and improve data validity and reliability throughout The Ohio State University Medical Center (OSUMC), we have initiated the development of an ontology-anchored metadata architecture and knowledge collection for our enterprise data warehouse. The metadata and corresponding semantic relationships stored in the OSUMC knowledge collection are intended to promote consistency and interoperability across the heterogeneous clinical, research, business and education information managed within the data warehouse.
Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators
ERIC Educational Resources Information Center
Mayernik, Matthew Stephen
2011-01-01
As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata…
Content standards for medical image metadata
NASA Astrophysics Data System (ADS)
d'Ornellas, Marcos C.; da Rocha, Rafael P.
2003-12-01
Medical images are at the heart of the healthcare diagnostic procedures. They have provided not only a noninvasive mean to view anatomical cross-sections of internal organs but also a mean for physicians to evaluate the patient"s diagnosis and monitor the effects of the treatment. For a Medical Center, the emphasis may shift from the generation of image to post processing and data management since the medical staff may generate even more processed images and other data from the original image after various analyses and post processing. A medical image data repository for health care information system is becoming a critical need. This data repository would contain comprehensive patient records, including information such as clinical data and related diagnostic images, and post-processed images. Due to the large volume and complexity of the data as well as the diversified user access requirements, the implementation of the medical image archive system will be a complex and challenging task. This paper discusses content standards for medical image metadata. In addition it also focuses on the image metadata content evaluation and metadata quality management.
DIRAC File Replica and Metadata Catalog
NASA Astrophysics Data System (ADS)
Tsaregorodtsev, A.; Poss, S.
2012-12-01
File replica and metadata catalogs are essential parts of any distributed data management system, which are largely determining its functionality and performance. A new File Catalog (DFC) was developed in the framework of the DIRAC Project that combines both replica and metadata catalog functionality. The DFC design is based on the practical experience with the data management system of the LHCb Collaboration. It is optimized for the most common patterns of the catalog usage in order to achieve maximum performance from the user perspective. The DFC supports bulk operations for replica queries and allows quick analysis of the storage usage globally and for each Storage Element separately. It supports flexible ACL rules with plug-ins for various policies that can be adopted by a particular community. The DFC catalog allows to store various types of metadata associated with files and directories and to perform efficient queries for the data based on complex metadata combinations. Definition of file ancestor-descendent relation chains is also possible. The DFC catalog is implemented in the general DIRAC distributed computing framework following the standard grid security architecture. In this paper we describe the design of the DFC and its implementation details. The performance measurements are compared with other grid file catalog implementations. The experience of the DFC Catalog usage in the CLIC detector project are discussed.
CruiseViewer: SIOExplorer Graphical Interface to Metadata and Archives.
NASA Astrophysics Data System (ADS)
Sutton, D. W.; Helly, J. J.; Miller, S. P.; Chase, A.; Clark, D.
2002-12-01
We are introducing "CruiseViewer" as a prototype graphical interface for the SIOExplorer digital library project, part of the overall NSF National Science Digital Library (NSDL) effort. When complete, CruiseViewer will provide access to nearly 800 cruises, as well as 100 years of documents and images from the archives of the Scripps Institution of Oceanography (SIO). The project emphasizes data object accessibility, a rich metadata format, efficient uploading methods and interoperability with other digital libraries. The primary function of CruiseViewer is to provide a human interface to the metadata database and to storage systems filled with archival data. The system schema is based on the concept of an "arbitrary digital object" (ADO). Arbitrary in that if the object can be stored on a computer system then SIOExplore can manage it. Common examples are a multibeam swath bathymetry file, a .pdf cruise report, or a tar file containing all the processing scripts used on a cruise. We require a metadata file for every ADO in an ascii "metadata interchange format" (MIF), which has proven to be highly useful for operability and extensibility. Bulk ADO storage is managed using the Storage Resource Broker, SRB, data handling middleware developed at the San Diego Supercomputer Center that centralizes management and access to distributed storage devices. MIF metadata are harvested from several sources and housed in a relational (Oracle) database. For CruiseViewer, cgi scripts resident on an Apache server are the primary communication and service request handling tools. Along with the CruiseViewer java application, users can query, access and download objects via a separate method that operates through standard web browsers, http://sioexplorer.ucsd.edu. Both provide the functionability to query and view object metadata, and select and download ADOs. For the CruiseViewer application Java 2D is used to add a geo-referencing feature that allows users to select basemap images and have vector shapes representing query results mapped over the basemap in the image panel. The two methods together address a wide range of user access needs and will allow for widespread use of SIOExplorer.
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.
In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unifiedmore » metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear Nonproliferation commitment to a multi-year project for follow-on work.« less
NASA Astrophysics Data System (ADS)
Sheldon, W.; Chamblee, J.; Cary, R. H.
2013-12-01
Environmental scientists are under increasing pressure from funding agencies and journal publishers to release quality-controlled data in a timely manner, as well as to produce comprehensive metadata for submitting data to long-term archives (e.g. DataONE, Dryad and BCO-DMO). At the same time, the volume of digital data that researchers collect and manage is increasing rapidly due to advances in high frequency electronic data collection from flux towers, instrumented moorings and sensor networks. However, few pre-built software tools are available to meet these data management needs, and those tools that do exist typically focus on part of the data management lifecycle or one class of data. The GCE Data Toolbox has proven to be both a generalized and effective software solution for environmental data management in the Long Term Ecological Research Network (LTER). This open source MATLAB software library, developed by the Georgia Coastal Ecosystems LTER program, integrates metadata capture, creation and management with data processing, quality control and analysis to support the entire data lifecycle. Raw data can be imported directly from common data logger formats (e.g. SeaBird, Campbell Scientific, YSI, Hobo), as well as delimited text files, MATLAB files and relational database queries. Basic metadata are derived from the data source itself (e.g. parsed from file headers) and by value inspection, and then augmented using editable metadata templates containing boilerplate documentation, attribute descriptors, code definitions and quality control rules. Data and metadata content, quality control rules and qualifier flags are then managed together in a robust data structure that supports database functionality and ensures data validity throughout processing. A growing suite of metadata-aware editing, quality control, analysis and synthesis tools are provided with the software to support managing data using graphical forms and command-line functions, as well as developing automated workflows for unattended processing. Finalized data and structured metadata can be exported in a wide variety of text and MATLAB formats or uploaded to a relational database for long-term archiving and distribution. The GCE Data Toolbox can be used as a complete, light-weight solution for environmental data and metadata management, but it can also be used in conjunction with other cyber infrastructure to provide a more comprehensive solution. For example, newly acquired data can be retrieved from a Data Turbine or Campbell LoggerNet Database server for quality control and processing, then transformed to CUAHSI Observations Data Model format and uploaded to a HydroServer for distribution through the CUAHSI Hydrologic Information System. The GCE Data Toolbox can also be leveraged in analytical workflows developed using Kepler or other systems that support MATLAB integration or tool chaining. This software can therefore be leveraged in many ways to help researchers manage, analyze and distribute the data they collect.
Collection Metadata Solutions for Digital Library Applications
NASA Technical Reports Server (NTRS)
Hill, Linda L.; Janee, Greg; Dolin, Ron; Frew, James; Larsgaard, Mary
1999-01-01
Within a digital library, collections may range from an ad hoc set of objects that serve a temporary purpose to established library collections intended to persist through time. The objects in these collections vary widely, from library and data center holdings to pointers to real-world objects, such as geographic places, and the various metadata schemas that describe them. The key to integrated use of such a variety of collections in a digital library is collection metadata that represents the inherent and contextual characteristics of a collection. The Alexandria Digital Library (ADL) Project has designed and implemented collection metadata for several purposes: in XML form, the collection metadata "registers" the collection with the user interface client; in HTML form, it is used for user documentation; eventually, it will be used to describe the collection to network search agents; and it is used for internal collection management, including mapping the object metadata attributes to the common search parameters of the system.
Earth Science Data Grid System
NASA Astrophysics Data System (ADS)
Chi, Y.; Yang, R.; Kafatos, M.
2004-05-01
The Earth Science Data Grid System (ESDGS) is a software system in support of earth science data storage and access. It is built upon the Storage Resource Broker (SRB) data grid technology. We have developed a complete data grid system consistent of SRB server providing users uniform access to diverse storage resources in a heterogeneous computing environment and metadata catalog server (MCAT) managing the metadata associated with data set, users, and resources. We also develop the earth science application metadata; geospatial, temporal, and content-based indexing; and some other tools. In this paper, we will describe software architecture and components of the data grid system, and use a practical example in support of storage and access of rainfall data from the Tropical Rainfall Measuring Mission (TRMM) to illustrate its functionality and features.
Metadata improvements driving new tools and services at a NASA data center
NASA Astrophysics Data System (ADS)
Moroni, D. F.; Hausman, J.; Foti, G.; Armstrong, E. M.
2011-12-01
The NASA Physical Oceanography DAAC (PO.DAAC) is responsible for distributing and maintaining satellite derived oceanographic data from a number of NASA and non-NASA missions for the physical disciplines of ocean winds, sea surface temperature, ocean topography and gravity. Currently its holdings consist of over 600 datasets with a data archive in excess of 200 Terrabytes. The PO.DAAC has recently embarked on a metadata quality and completeness project to migrate, update and improve metadata records for over 300 public datasets. An interactive database management tool has been developed to allow data scientists to enter, update and maintain metadata records. This tool communicates directly with PO.DAAC's Data Management and Archiving System (DMAS), which serves as the new archival and distribution backbone as well as a permanent repository of dataset and granule-level metadata. Although we will briefly discuss the tool, more important ramifications are the ability to now expose, propagate and leverage the metadata in a number of ways. First, the metadata are exposed directly through a faceted and free text search interface directly from drupal-based PO.DAAC web pages allowing for quick browsing and data discovery especially by "drilling" through the various facet levels that organize datasets by time/space resolution, processing level, sensor, measurement type etc. Furthermore, the metadata can now be exposed through web services to produce metadata records in a number of different formats such as FGDC and ISO 19115, or potentially propagated to visualization and subsetting tools, and other discovery interfaces. The fundamental concept is that the metadata forms the essential bridge between the user, and the tool or discovery mechanism for a broad range of ocean earth science data records.
Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation
NASA Astrophysics Data System (ADS)
Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.
2016-12-01
Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally deployed DataONE quality service can achieve major efficiency gains by allowing member repositories to customize and use recommendations that fit their specific needs without having to create de novo infrastructure at their site.
Streamlining geospatial metadata in the Semantic Web
NASA Astrophysics Data System (ADS)
Fugazza, Cristiano; Pepe, Monica; Oggioni, Alessandro; Tagliolato, Paolo; Carrara, Paola
2016-04-01
In the geospatial realm, data annotation and discovery rely on a number of ad-hoc formats and protocols. These have been created to enable domain-specific use cases generalized search is not feasible for. Metadata are at the heart of the discovery process and nevertheless they are often neglected or encoded in formats that either are not aimed at efficient retrieval of resources or are plainly outdated. Particularly, the quantum leap represented by the Linked Open Data (LOD) movement did not induce so far a consistent, interlinked baseline in the geospatial domain. In a nutshell, datasets, scientific literature related to them, and ultimately the researchers behind these products are only loosely connected; the corresponding metadata intelligible only to humans, duplicated on different systems, seldom consistently. Instead, our workflow for metadata management envisages i) editing via customizable web- based forms, ii) encoding of records in any XML application profile, iii) translation into RDF (involving the semantic lift of metadata records), and finally iv) storage of the metadata as RDF and back-translation into the original XML format with added semantics-aware features. Phase iii) hinges on relating resource metadata to RDF data structures that represent keywords from code lists and controlled vocabularies, toponyms, researchers, institutes, and virtually any description one can retrieve (or directly publish) in the LOD Cloud. In the context of a distributed Spatial Data Infrastructure (SDI) built on free and open-source software, we detail phases iii) and iv) of our workflow for the semantics-aware management of geospatial metadata.
Leveraging Metadata to Create Better Web Services
ERIC Educational Resources Information Center
Mitchell, Erik
2012-01-01
Libraries have been increasingly concerned with data creation, management, and publication. This increase is partly driven by shifting metadata standards in libraries and partly by the growth of data and metadata repositories being managed by libraries. In order to manage these data sets, libraries are looking for new preservation and discovery…
A document centric metadata registration tool constructing earth environmental data infrastructure
NASA Astrophysics Data System (ADS)
Ichino, M.; Kinutani, H.; Ono, M.; Shimizu, T.; Yoshikawa, M.; Masuda, K.; Fukuda, K.; Kawamoto, H.
2009-12-01
DIAS (Data Integration and Analysis System) is one of GEOSS activities in Japan. It is also a leading part of the GEOSS task with the same name defined in GEOSS Ten Year Implementation Plan. The main mission of DIAS is to construct data infrastructure that can effectively integrate earth environmental data such as observation data, numerical model outputs, and socio-economic data provided from the fields of climate, water cycle, ecosystem, ocean, biodiversity and agriculture. Some of DIAS's data products are available at the following web site of http://www.jamstec.go.jp/e/medid/dias. Most of earth environmental data commonly have spatial and temporal attributes such as the covering geographic scope or the created date. The metadata standards including these common attributes are published by the geographic information technical committee (TC211) in ISO (the International Organization for Standardization) as specifications of ISO 19115:2003 and 19139:2007. Accordingly, DIAS metadata is developed with basing on ISO/TC211 metadata standards. From the viewpoint of data users, metadata is useful not only for data retrieval and analysis but also for interoperability and information sharing among experts, beginners and nonprofessionals. On the other hand, from the viewpoint of data providers, two problems were pointed out after discussions. One is that data providers prefer to minimize another tasks and spending time for creating metadata. Another is that data providers want to manage and publish documents to explain their data sets more comprehensively. Because of solving these problems, we have been developing a document centric metadata registration tool. The features of our tool are that the generated documents are available instantly and there is no extra cost for data providers to generate metadata. Also, this tool is developed as a Web application. So, this tool does not demand any software for data providers if they have a web-browser. The interface of the tool provides the section titles of the documents and by filling out the content of each section, the documents for the data sets are automatically published in PDF and HTML format. Furthermore, the metadata XML file which is compliant with ISO19115 and ISO19139 is created at the same moment. The generated metadata are managed in the metadata database of the DIAS project, and will be used in various ISO19139 compliant metadata management tools, such as GeoNetwork.
A Metadata Management Framework for Collaborative Review of Science Data Products
NASA Astrophysics Data System (ADS)
Hart, A. F.; Cinquini, L.; Mattmann, C. A.; Thompson, D. R.; Wagstaff, K.; Zimdars, P. A.; Jones, D. L.; Lazio, J.; Preston, R. A.
2012-12-01
Data volumes generated by modern scientific instruments often preclude archiving the complete observational record. To compensate, science teams have developed a variety of "triage" techniques for identifying data of potential scientific interest and marking it for prioritized processing or permanent storage. This may involve multiple stages of filtering with both automated and manual components operating at different timescales. A promising approach exploits a fast, fully automated first stage followed by a more reliable offline manual review of candidate events. This hybrid approach permits a 24-hour rapid real-time response while also preserving the high accuracy of manual review. To support this type of second-level validation effort, we have developed a metadata-driven framework for the collaborative review of candidate data products. The framework consists of a metadata processing pipeline and a browser-based user interface that together provide a configurable mechanism for reviewing data products via the web, and capturing the full stack of associated metadata in a robust, searchable archive. Our system heavily leverages software from the Apache Object Oriented Data Technology (OODT) project, an open source data integration framework that facilitates the construction of scalable data systems and places a heavy emphasis on the utilization of metadata to coordinate processing activities. OODT provides a suite of core data management components for file management and metadata cataloging that form the foundation for this effort. The system has been deployed at JPL in support of the V-FASTR experiment [1], a software-based radio transient detection experiment that operates commensally at the Very Long Baseline Array (VLBA), and has a science team that is geographically distributed across several countries. Daily review of automatically flagged data is a shared responsibility for the team, and is essential to keep the project within its resource constraints. We describe the development of the platform using open source software, and discuss our experience deploying the system operationally. [1] R.B.Wayth,W.F.Brisken,A.T.Deller,W.A.Majid,D.R.Thompson, S. J. Tingay, and K. L. Wagstaff, "V-fastr: The vlba fast radio transients experiment," The Astrophysical Journal, vol. 735, no. 2, p. 97, 2011. Acknowledgement: This effort was supported by the Jet Propulsion Laboratory, managed by the California Institute of Technology under a contract with the National Aeronautics and Space Administration.
An asynchronous traversal engine for graph-based rich metadata management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Dong; Carns, Philip; Ross, Robert B.
Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
An asynchronous traversal engine for graph-based rich metadata management
Dai, Dong; Carns, Philip; Ross, Robert B.; ...
2016-06-23
Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
Large-Scale Data Collection Metadata Management at the National Computation Infrastructure
NASA Astrophysics Data System (ADS)
Wang, J.; Evans, B. J. K.; Bastrakova, I.; Ryder, G.; Martin, J.; Duursma, D.; Gohar, K.; Mackey, T.; Paget, M.; Siddeswara, G.
2014-12-01
Data Collection management has become an essential activity at the National Computation Infrastructure (NCI) in Australia. NCI's partners (CSIRO, Bureau of Meteorology, Australian National University, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI), have established a national data resource that is co-located with high-performance computing. This paper addresses the metadata management of these data assets over their lifetime. NCI manages 36 data collections (10+ PB) categorised as earth system sciences, climate and weather model data assets and products, earth and marine observations and products, geosciences, terrestrial ecosystem, water management and hydrology, astronomy, social science and biosciences. The data is largely sourced from NCI partners, the custodians of many of the national scientific records, and major research community organisations. The data is made available in a HPC and data-intensive environment - a ~56000 core supercomputer, virtual labs on a 3000 core cloud system, and data services. By assembling these large national assets, new opportunities have arisen to harmonise the data collections, making a powerful cross-disciplinary resource.To support the overall management, a Data Management Plan (DMP) has been developed to record the workflows, procedures, the key contacts and responsibilities. The DMP has fields that can be exported to the ISO19115 schema and to the collection level catalogue of GeoNetwork. The subset or file level metadata catalogues are linked with the collection level through parent-child relationship definition using UUID. A number of tools have been developed that support interactive metadata management, bulk loading of data, and support for computational workflows or data pipelines. NCI creates persistent identifiers for each of the assets. The data collection is tracked over its lifetime, and the recognition of the data providers, data owners, data generators and data aggregators are updated. A Digital Object Identifier is assigned using the Australian National Data Service (ANDS). Once the data has been quality assured, a DOI is minted and the metadata record updated. NCI's data citation policy establishes the relationship between research outcomes, data providers, and the data.
NASA Astrophysics Data System (ADS)
Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.
2007-12-01
Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
Viewing and Editing Earth Science Metadata MOBE: Metadata Object Browser and Editor in Java
NASA Astrophysics Data System (ADS)
Chase, A.; Helly, J.
2002-12-01
Metadata is an important, yet often neglected aspect of successful archival efforts. However, to generate robust, useful metadata is often a time consuming and tedious task. We have been approaching this problem from two directions: first by automating metadata creation, pulling from known sources of data, and in addition, what this (paper/poster?) details, developing friendly software for human interaction with the metadata. MOBE and COBE(Metadata Object Browser and Editor, and Canonical Object Browser and Editor respectively), are Java applications for editing and viewing metadata and digital objects. MOBE has already been designed and deployed, currently being integrated into other areas of the SIOExplorer project. COBE is in the design and development stage, being created with the same considerations in mind as those for MOBE. Metadata creation, viewing, data object creation, and data object viewing, when taken on a small scale are all relatively simple tasks. Computer science however, has an infamous reputation for transforming the simple into complex. As a system scales upwards to become more robust, new features arise and additional functionality is added to the software being written to manage the system. The software that emerges from such an evolution, though powerful, is often complex and difficult to use. With MOBE the focus is on a tool that does a small number of tasks very well. The result has been an application that enables users to manipulate metadata in an intuitive and effective way. This allows for a tool that serves its purpose without introducing additional cognitive load onto the user, an end goal we continue to pursue.
NASA Astrophysics Data System (ADS)
Yatagai, A. I.; Iyemori, T.; Ritschel, B.; Koyama, Y.; Hori, T.; Abe, S.; Tanaka, Y.; Shinbori, A.; Umemura, N.; Sato, Y.; Yagi, M.; Ueno, S.; Hashiguchi, N. O.; Kaneda, N.; Belehaki, A.; Hapgood, M. A.
2013-12-01
The IUGONET is a Japanese program to build a metadata database for ground-based observations of the upper atmosphere [1]. The project began in 2009 with five Japanese institutions which archive data observed by radars, magnetometers, photometers, radio telescopes and helioscopes, and so on, at various altitudes from the Earth's surface to the Sun. Systems have been developed to allow searching of the above described metadata. We have been updating the system and adding new and updated metadata. The IUGONET development team adopted the SPASE metadata model [2] to describe the upper atmosphere data. This model is used as the common metadata format by the virtual observatories for solar-terrestrial physics. It includes metadata referring to each data file (called a 'Granule'), which enable a search for data files as well as data sets. Further details are described in [2] and [3]. Currently, three additional Japanese institutions are being incorporated in IUGONET. Furthermore, metadata of observations of the troposphere, taken at the observatories of the middle and upper atmosphere radar at Shigaraki and the Meteor radar in Indonesia, have been incorporated. These additions will contribute to efficient interdisciplinary scientific research. In the beginning of 2013, the registration of the 'Observatory' and 'Instrument' metadata was completed, which makes it easy to overview of the metadata database. The number of registered metadata as of the end of July, totalled 8.8 million, including 793 observatories and 878 instruments. It is important to promote interoperability and/or metadata exchange between the database development groups. A memorandum of agreement has been signed with the European Near-Earth Space Data Infrastructure for e-Science (ESPAS) project, which has similar objectives to IUGONET with regard to a framework for formal collaboration. Furthermore, observations by satellites and the International Space Station are being incorporated with a view for making/linking metadata databases. The development of effective data systems will contribute to the progress of scientific research on solar terrestrial physics, climate and the geophysical environment. Any kind of cooperation, metadata input and feedback, especially for linkage of the databases, is welcomed. References 1. Hayashi, H. et al., Inter-university Upper Atmosphere Global Observation Network (IUGONET), Data Sci. J., 12, WDS179-184, 2013. 2. King, T. et al., SPASE 2.0: A standard data model for space physics. Earth Sci. Inform. 3, 67-73, 2010, doi:10.1007/s12145-010-0053-4. 3. Hori, T., et al., Development of IUGONET metadata format and metadata management system. J. Space Sci. Info. Jpn., 105-111, 2012. (in Japanese)
NASA Astrophysics Data System (ADS)
Wang, Jingbo; Bastrakova, Irina; Evans, Ben; Gohar, Kashif; Santana, Fabiana; Wyborn, Lesley
2015-04-01
National Computational Infrastructure (NCI) manages national environmental research data collections (10+ PB) as part of its specialized high performance data node of the Research Data Storage Infrastructure (RDSI) program. We manage 40+ data collections using NCI's Data Management Plan (DMP), which is compatible with the ISO 19100 metadata standards. We utilize ISO standards to make sure our metadata is transferable and interoperable for sharing and harvesting. The DMP is used along with metadata from the data itself, to create a hierarchy of data collection, dataset and time series catalogues that is then exposed through GeoNetwork for standard discoverability. This hierarchy catalogues are linked using a parent-child relationship. The hierarchical infrastructure of our GeoNetwork catalogues system aims to address both discoverability and in-house administrative use-cases. At NCI, we are currently improving the metadata interoperability in our catalogue by linking with standardized community vocabulary services. These emerging vocabulary services are being established to help harmonise data from different national and international scientific communities. One such vocabulary service is currently being established by the Australian National Data Services (ANDS). Data citation is another important aspect of the NCI data infrastructure, which allows tracking of data usage and infrastructure investment, encourage data sharing, and increasing trust in research that is reliant on these data collections. We incorporate the standard vocabularies into the data citation metadata so that the data citation become machine readable and semantically friendly for web-search purpose as well. By standardizing our metadata structure across our entire data corpus, we are laying the foundation to enable the application of appropriate semantic mechanisms to enhance discovery and analysis of NCI's national environmental research data information. We expect that this will further increase the data discoverability and encourage the data sharing and reuse within the community, increasing the value of the data much further than its current use.
Evolution of Web Services in EOSDIS: Search and Order Metadata Registry (ECHO)
NASA Technical Reports Server (NTRS)
Mitchell, Andrew; Ramapriyan, Hampapuram; Lowe, Dawn
2009-01-01
During 2005 through 2008, NASA defined and implemented a major evolutionary change in it Earth Observing system Data and Information System (EOSDIS) to modernize its capabilities. This implementation was based on a vision for 2015 developed during 2005. The EOSDIS 2015 Vision emphasizes increased end-to-end data system efficiency and operability; increased data usability; improved support for end users; and decreased operations costs. One key feature of the Evolution plan was achieving higher operational maturity (ingest, reconciliation, search and order, performance, error handling) for the NASA s Earth Observing System Clearinghouse (ECHO). The ECHO system is an operational metadata registry through which the scientific community can easily discover and exchange NASA's Earth science data and services. ECHO contains metadata for 2,726 data collections comprising over 87 million individual data granules and 34 million browse images, consisting of NASA s EOSDIS Data Centers and the United States Geological Survey's Landsat Project holdings. ECHO is a middleware component based on a Service Oriented Architecture (SOA). The system is comprised of a set of infrastructure services that enable the fundamental SOA functions: publish, discover, and access Earth science resources. It also provides additional services such as user management, data access control, and order management. The ECHO system has a data registry and a services registry. The data registry enables organizations to publish EOS and other Earth-science related data holdings to a common metadata model. These holdings are described through metadata in terms of datasets (types of data) and granules (specific data items of those types). ECHO also supports browse images, which provide a visual representation of the data. The published metadata can be mapped to and from existing standards (e.g., FGDC, ISO 19115). With ECHO, users can find the metadata stored in the data registry and then access the data either directly online or through a brokered order to the data archive organization. ECHO stores metadata from a variety of science disciplines and domains, including Climate Variability and Change, Carbon Cycle and Ecosystems, Earth Surface and Interior, Atmospheric Composition, Weather, and Water and Energy Cycle. ECHO also has a services registry for community-developed search services and data services. ECHO provides a platform for the publication, discovery, understanding and access to NASA s Earth Observation resources (data, service and clients). In their native state, these data, service and client resources are not necessarily targeted for use beyond their original mission. However, with the proper interoperability mechanisms, users of these resources can expand their value, by accessing, combining and applying them in unforeseen ways.
Context based configuration management system
NASA Technical Reports Server (NTRS)
Gurram, Mohana M. (Inventor); Maluf, David A. (Inventor); Mederos, Luis A. (Inventor); Gawdiak, Yuri O. (Inventor)
2010-01-01
A computer-based system for configuring and displaying information on changes in, and present status of, a collection of events associated with a project. Classes of icons for decision events, configurations and feedback mechanisms, and time lines (sequential and/or simultaneous) for related events are displayed. Metadata for each icon in each class is displayed by choosing and activating the corresponding icon. Access control (viewing, reading, writing, editing, deleting, etc.) is optionally imposed for metadata and other displayed information.
Unified Science Information Model for SoilSCAPE using the Mercury Metadata Search System
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Lu, Kefa; Palanisamy, Giri; Cook, Robert; Santhana Vannan, Suresh; Moghaddam, Mahta Clewley, Dan; Silva, Agnelo; Akbar, Ruzbeh
2013-12-01
SoilSCAPE (Soil moisture Sensing Controller And oPtimal Estimator) introduces a new concept for a smart wireless sensor web technology for optimal measurements of surface-to-depth profiles of soil moisture using in-situ sensors. The objective is to enable a guided and adaptive sampling strategy for the in-situ sensor network to meet the measurement validation objectives of spaceborne soil moisture sensors such as the Soil Moisture Active Passive (SMAP) mission. This work is being carried out at the University of Michigan, the Massachusetts Institute of Technology, University of Southern California, and Oak Ridge National Laboratory. At Oak Ridge National Laboratory we are using Mercury metadata search system [1] for building a Unified Information System for the SoilSCAPE project. This unified portal primarily comprises three key pieces: Distributed Search/Discovery; Data Collections and Integration; and Data Dissemination. Mercury, a Federally funded software for metadata harvesting, indexing, and searching would be used for this module. Soil moisture data sources identified as part of this activity such as SoilSCAPE and FLUXNET (in-situ sensors), AirMOSS (airborne retrieval), SMAP (spaceborne retrieval), and are being indexed and maintained by Mercury. Mercury would be the central repository of data sources for cal/val for soil moisture studies and would provide a mechanism to identify additional data sources. Relevant metadata from existing inventories such as ORNL DAAC, USGS Clearinghouse, ARM, NASA ECHO, GCMD etc. would be brought in to this soil-moisture data search/discovery module. The SoilSCAPE [2] metadata records will also be published in broader metadata repositories such as GCMD, data.gov. Mercury can be configured to provide a single portal to soil moisture information contained in disparate data management systems located anywhere on the Internet. Mercury is able to extract, metadata systematically from HTML pages or XML files using a variety of methods including OAI-PMH [3]. The Mercury search interface then allows users to perform simple, fielded, spatial and temporal searches across a central harmonized index of metadata. Mercury supports various metadata standards including FGDC, ISO-19115, DIF, Dublin-Core, Darwin-Core, and EML. This poster describes in detail how Mercury implements the Unified Science Information Model for Soil moisture data. References: [1]Devarakonda R., et al. Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics (2010), 3(1): 87-94. [2]Devarakonda R., et al. Daymet: Single Pixel Data Extraction Tool. http://daymet.ornl.gov/singlepixel.html (2012). Last Accesses 10-01-2013 [3]Devarakonda R., et al. Data sharing and retrieval using OAI-PMH. Earth Science Informatics (2011), 4(1): 1-5.
A Generic Metadata Editor Supporting System Using Drupal CMS
NASA Astrophysics Data System (ADS)
Pan, J.; Banks, N. G.; Leggott, M.
2011-12-01
Metadata handling is a key factor in preserving and reusing scientific data. In recent years, standardized structural metadata has become widely used in Geoscience communities. However, there exist many different standards in Geosciences, such as the current version of the Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata (FGDC CSDGM), the Ecological Markup Language (EML), the Geography Markup Language (GML), and the emerging ISO 19115 and related standards. In addition, there are many different subsets within the Geoscience subdomain such as the Biological Profile of the FGDC (CSDGM), or for geopolitical regions, such as the European Profile or the North American Profile in the ISO standards. It is therefore desirable to have a software foundation to support metadata creation and editing for multiple standards and profiles, without re-inventing the wheels. We have developed a software module as a generic, flexible software system to do just that: to facilitate the support for multiple metadata standards and profiles. The software consists of a set of modules for the Drupal Content Management System (CMS), with minimal inter-dependencies to other Drupal modules. There are two steps in using the system's metadata functions. First, an administrator can use the system to design a user form, based on an XML schema and its instances. The form definition is named and stored in the Drupal database as a XML blob content. Second, users in an editor role can then use the persisted XML definition to render an actual metadata entry form, for creating or editing a metadata record. Behind the scenes, the form definition XML is transformed into a PHP array, which is then rendered via Drupal Form API. When the form is submitted the posted values are used to modify a metadata record. Drupal hooks can be used to perform custom processing on metadata record before and after submission. It is trivial to store the metadata record as an actual XML file or in a storage/archive system. We are working on adding many features to help editor users, such as auto completion, pre-populating of forms, partial saving, as well as automatic schema validation. In this presentation we will demonstrate a few sample editors, including an FGDC editor and a bare bone editor for ISO 19115/19139. We will also demonstrate the use of templates during the definition phase, with the support of export and import functions. Form pre-population and input validation will also be covered. Theses modules are available as open-source software from the Islandora software foundation, as a component of a larger Drupal-based data archive system. They can be easily installed as stand-alone system, or to be plugged into other existing metadata platforms.
NASA Astrophysics Data System (ADS)
Moore, J.; Serreze, M. C.; Middleton, D.; Ramamurthy, M. K.; Yarmey, L.
2013-12-01
The NSF funds the Advanced Cooperative Arctic Data and Information System (ACADIS), url: (http://www.aoncadis.org/). It serves the growing and increasingly diverse data management needs of NSF's arctic research community. The ACADIS investigator team combines experienced data managers, curators and software engineers from the NSIDC, UCAR and NCAR. ACADIS fosters scientific synthesis and discovery by providing a secure long-term data archive to NSF investigators. The system provides discovery and access to arctic related data from this and other archives. This paper updates the technical components of ACADIS, the implementation of best practices, the value of ACADIS to the community and the major challenges facing this archive for the future in handling the diverse data coming from NSF Arctic investigators. ACADIS provides sustainable data management, data stewardship services and leadership for the NSF Arctic research community through open data sharing, adherence to best practices and standards, capitalizing on appropriate evolving technologies, community support and engagement. ACADIS leverages other pertinent projects, capitalizing on appropriate emerging technologies and participating in emerging cyberinfrastructure initiatives. The key elements of ACADIS user services to the NSF Arctic community include: data and metadata upload; support for datasets with special requirements; metadata and documentation generation; interoperability and initiatives with other archives; and science support to investigators and the community. Providing a self-service data publishing platform requiring minimal curation oversight while maintaining rich metadata for discovery, access and preservation is challenging. Implementing metadata standards are a first step towards consistent content. The ACADIS Gateway and ADE offer users choices for data discovery and access with the clear objective of increasing discovery and use of all Arctic data especially for analysis activities. Metadata is at the core of ACADIS activities, from capturing metadata at the point of data submission to ensuring interoperability , providing data citations, and supporting data discovery. ACADIS metadata efforts include: 1) Evolution of the ACADIS metadata profile to increase flexibility in search; 2) Documentation guidelines; and 3) Metadata standardization efforts. A major activity is now underway to ensure consistency in the metadata profile across all archived datasets. ACADIS is embarking on a critical activity to create Digital Object Identifiers (DOI) for all its holdings. The data services offered by ACADIS focus on meeting the needs of the data providers, providing dynamic search capabilities to peruse the ACADIS and related cyrospheric data repositories, efficient data download and some special services including dataset reformatting and visualization. The service is built around of the following key technical elements: The ACADIS Gateway housed at NCAR has been developed to support NSF Arctic data coming from AON and now broadly across PLR/ARC and related archives: The Arctic Data Explorer (ADE) developed at NSIDC is an integral service of ACADIS bringing the rich archive from NSIDC together with catalogs from ACADIS and international partners in Arctic research: and Rosetta and the Digital Object Identifier (DOI) generation scheme are tools available to the community to help publish and utilize datasets in integration and synthesis and publication.
Data Bookkeeping Service 3 - Providing Event Metadata in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giffels, Manuel; Guo, Y.; Riley, Daniel
The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about 200, 000 datasets and more than 40 million files, which adds up in around 700 GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems [1], all kind of data-processing like Monte Carlo production,more » processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.« less
Geospatial resources for supporting data standards, guidance and best practice in health informatics
2011-01-01
Background The 1980s marked the occasion when Geographical Information System (GIS) technology was broadly introduced into the geo-spatial community through the establishment of a strong GIS industry. This technology quickly disseminated across many countries, and has now become established as an important research, planning and commercial tool for a wider community that includes organisations in the public and private health sectors. The broad acceptance of GIS technology and the nature of its functionality have meant that numerous datasets have been created over the past three decades. Most of these datasets have been created independently, and without any structured documentation systems in place. However, search and retrieval systems can only work if there is a mechanism for datasets existence to be discovered and this is where proper metadata creation and management can greatly help. This situation must be addressed through support mechanisms such as Web-based portal technologies, metadata editor tools, automation, metadata standards and guidelines and collaborative efforts with relevant individuals and organisations. Engagement with data developers or administrators should also include a strategy of identifying the benefits associated with metadata creation and publication. Findings The establishment of numerous Spatial Data Infrastructures (SDIs), and other Internet resources, is a testament to the recognition of the importance of supporting good data management and sharing practices across the geographic information community. These resources extend to health informatics in support of research, public services and teaching and learning. This paper identifies many of these resources available to the UK academic health informatics community. It also reveals the reluctance of many spatial data creators across the wider UK academic community to use these resources to create and publish metadata, or deposit their data in repositories for sharing. The Go-Geo! service is introduced as an SDI developed to provide UK academia with the necessary resources to address the concerns surrounding metadata creation and data sharing. The Go-Geo! portal, Geodoc metadata editor tool, ShareGeo spatial data repository, and a range of other support resources, are described in detail. Conclusions This paper describes a variety of resources available for the health research and public health sector to use for managing and sharing their data. The Go-Geo! service is one resource which offers an SDI for the eclectic range of disciplines using GIS in UK academia, including health informatics. The benefits of data management and sharing are immense, and in these times of cost restraints, these resources can be seen as solutions to find cost savings which can be reinvested in more research. PMID:21269487
NASA Astrophysics Data System (ADS)
Troyan, D.
2016-12-01
The Atmospheric Radiation Measurement (ARM) program has been collecting data from instruments in diverse climate regions for nearly twenty-five years. These data are made available to all interested parties at no cost via specially designed tools found on the ARM website (www.arm.gov). Metadata is created and applied to the various datastreams to facilitate information retrieval using the ARM website, the ARM Data Discovery Tool, and data quality reporting tools. Over the last year, the Metadata Manager - a relatively new position within the ARM program - created two documents that summarize the state of ARM metadata processes: ARM Metadata Workflow, and ARM Metadata Standards. These documents serve as guides to the creation and management of ARM metadata. With many of ARM's data functions spread around the Department of Energy national laboratory complex and with many of the original architects of the metadata structure no longer working for ARM, there is increased importance on using these documents to resolve issues from data flow bottlenecks and inaccurate metadata to improving data discovery and organizing web pages. This presentation will provide some examples from the workflow and standards documents. The examples will illustrate the complexity of the ARM metadata processes and the efficiency by which the metadata team works towards achieving the goal of providing access to data collected under the auspices of the ARM program.
Software for minimalistic data management in large camera trap studies
Krishnappa, Yathin S.; Turner, Wendy C.
2014-01-01
The use of camera traps is now widespread and their importance in wildlife studies well understood. Camera trap studies can produce millions of photographs and there is a need for software to help manage photographs efficiently. In this paper, we describe a software system that was built to successfully manage a large behavioral camera trap study that produced more than a million photographs. We describe the software architecture and the design decisions that shaped the evolution of the program over the study’s three year period. The software system has the ability to automatically extract metadata from images, and add customized metadata to the images in a standardized format. The software system can be installed as a standalone application on popular operating systems. It is minimalistic, scalable and extendable so that it can be used by small teams or individual researchers for a broad variety of camera trap studies. PMID:25110471
Mercury: Reusable software application for Metadata Management, Data Discovery and Access
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.
2009-12-01
Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury is itself a reusable toolset for metadata, with current use in 12 different projects. Mercury also supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects To balance these common and project-specific needs, Mercury’s architecture includes three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of configuration files. The harvested files are then passed to the Indexing system, where each of the fields in these structured metadata records are indexed properly, so that the query engine can perform simple, keyword, spatial and temporal searches across these metadata sources. The search user interface software has two API categories; a common core API which is used by all the Mercury user interfaces for querying the index and a customized API for project specific user interfaces. For our work in producing a reusable, portable, robust, feature-rich application, Mercury received a 2008 NASA Earth Science Data Systems Software Reuse Working Group Peer-Recognition Software Reuse Award. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
Metadata Sets for e-Government Resources: The Extended e-Government Metadata Schema (eGMS+)
NASA Astrophysics Data System (ADS)
Charalabidis, Yannis; Lampathaki, Fenareti; Askounis, Dimitris
In the dawn of the Semantic Web era, metadata appear as a key enabler that assists management of the e-Government resources related to the provision of personalized, efficient and proactive services oriented towards the real citizens’ needs. As different authorities typically use different terms to describe their resources and publish them in various e-Government Registries that may enhance the access to and delivery of governmental knowledge, but also need to communicate seamlessly at a national and pan-European level, the need for a unified e-Government metadata standard emerges. This paper presents the creation of an ontology-based extended metadata set for e-Government Resources that embraces services, documents, XML Schemas, code lists, public bodies and information systems. Such a metadata set formalizes the exchange of information between portals and registries and assists the service transformation and simplification efforts, while it can be further taken into consideration when applying Web 2.0 techniques in e-Government.
Metadata and Service at the GFZ ISDC Portal
NASA Astrophysics Data System (ADS)
Ritschel, B.
2008-05-01
The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for an explicit identification of single data files and the set-up of a comprehensive Earth science data catalog. The huge ISDC data catalog is realized by product type dependent tables filled with data file related metadata, which have relations to corresponding metadata tables. The product type describing parent DIF XML metadata documents are stored and managed in ORACLE's XML storage structures. In order to improve the interoperability of the ISDC service portal, the existing proprietary catalog system will be extended by an ISO 19115 based web catalog service. In addition to this development there is ISDC related concerning semantic network of different kind of metadata resources, like different kind of standardized and not-standardized metadata documents and literature as well as Web 2.0 user generated information derived from tagging activities and social navigation data.
NASA Astrophysics Data System (ADS)
Keck, N. N.; Macduff, M.; Martin, T.
2017-12-01
The Atmospheric Radiation Measurement's (ARM) Data Management Facility (DMF) plays a critical support role in processing and curating data generated by the Department of Energy's ARM Program. Data are collected near real time from hundreds of observational instruments spread out all over the globe. Data are then ingested hourly to provide time series data in NetCDF (network Common Data Format) and includes standardized metadata. Based on automated processes and a variety of user reviews the data may need to be reprocessed. Final data sets are then stored and accessed by users through the ARM Archive. Over the course of 20 years, a suite of data visualization tools have been developed to facilitate the operational processes to manage and maintain the more than 18,000 real time events, that move 1.3 TB of data each day through the various stages of the DMF's data system. This poster will present the resources and methodology used to capture metadata and the tools that assist in routine data management and discoverability.
The NCAR Digital Asset Services Hub (DASH): Implementing Unified Data Discovery and Access
NASA Astrophysics Data System (ADS)
Stott, D.; Worley, S. J.; Hou, C. Y.; Nienhouse, E.
2017-12-01
The National Center for Atmospheric Research (NCAR) Directorate created the Data Stewardship Engineering Team (DSET) to plan and implement an integrated single entry point for uniform digital asset discovery and access across the organization in order to improve the efficiency of access, reduce the costs, and establish the foundation for interoperability with other federated systems. This effort supports new policies included in federal funding mandates, NSF data management requirements, and journal citation recommendations. An inventory during the early planning stage identified diverse asset types across the organization that included publications, datasets, metadata, models, images, and software tools and code. The NCAR Digital Asset Services Hub (DASH) is being developed and phased in this year to improve the quality of users' experiences in finding and using these assets. DASH serves to provide engagement, training, search, and support through the following four nodes (see figure). DASH MetadataDASH provides resources for creating and cataloging metadata to the NCAR Dialect, a subset of ISO 19115. NMDEdit, an editor based on a European open source application, has been configured for manual entry of NCAR metadata. CKAN, an open source data portal platform, harvests these XML records (along with records output directly from databases) from a Web Accessible Folder (WAF) on GitHub for validation. DASH SearchThe NCAR Dialect metadata drives cross-organization search and discovery through CKAN, which provides the display interface of search results. DASH search will establish interoperability by facilitating metadata sharing with other federated systems. DASH ConsultingThe DASH Data Curation & Stewardship Coordinator assists with Data Management (DM) Plan preparation and advises on Digital Object Identifiers. The coordinator arranges training sessions on the DASH metadata tools and DM planning, and provides one-on-one assistance as requested. DASH RepositoryA repository is under development for NCAR datasets currently not in existing lab-managed archives. The DASH repository will be under NCAR governance and meet Trustworthy Repositories Audit & Certification (TRAC) requirements. This poster will highlight the processes, lessons learned, and current status of the DASH effort at NCAR.
Information integration for a sky survey by data warehousing
NASA Astrophysics Data System (ADS)
Luo, A.; Zhang, Y.; Zhao, Y.
The virtualization service of data system for a sky survey LAMOST is very important for astronomers The service needs to integrate information from data collections catalogs and references and support simple federation of a set of distributed files and associated metadata Data warehousing has been in existence for several years and demonstrated superiority over traditional relational database management systems by providing novel indexing schemes that supported efficient on-line analytical processing OLAP of large databases Now relational database systems such as Oracle etc support the warehouse capability which including extensions to the SQL language to support OLAP operations and a number of metadata management tools have been created The information integration of LAMOST by applying data warehousing is to effectively provide data and knowledge on-line
A Grid Metadata Service for Earth and Environmental Sciences
NASA Astrophysics Data System (ADS)
Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni
2010-05-01
Critical challenges for climate modeling researchers are strongly connected with the increasingly complex simulation models and the huge quantities of produced datasets. Future trends in climate modeling will only increase computational and storage requirements. For this reason the ability to transparently access to both computational and data resources for large-scale complex climate simulations must be considered as a key requirement for Earth Science and Environmental distributed systems. From the data management perspective (i) the quantity of data will continuously increases, (ii) data will become more and more distributed and widespread, (iii) data sharing/federation will represent a key challenging issue among different sites distributed worldwide, (iv) the potential community of users (large and heterogeneous) will be interested in discovery experimental results, searching of metadata, browsing collections of files, compare different results, display output, etc.; A key element to carry out data search and discovery, manage and access huge and distributed amount of data is the metadata handling framework. What we propose for the management of distributed datasets is the GRelC service (a data grid solution focusing on metadata management). Despite the classical approaches, the proposed data-grid solution is able to address scalability, transparency, security and efficiency and interoperability. The GRelC service we propose is able to provide access to metadata stored in different and widespread data sources (relational databases running on top of MySQL, Oracle, DB2, etc. leveraging SQL as query language, as well as XML databases - XIndice, eXist, and libxml2 based documents, adopting either XPath or XQuery) providing a strong data virtualization layer in a grid environment. Such a technological solution for distributed metadata management leverages on well known adopted standards (W3C, OASIS, etc.); (ii) supports role-based management (based on VOMS), which increases flexibility and scalability; (iii) provides full support for Grid Security Infrastructure, which means (authorization, mutual authentication, data integrity, data confidentiality and delegation); (iv) is compatible with existing grid middleware such as gLite and Globus and finally (v) is currently adopted at the Euro-Mediterranean Centre for Climate Change (CMCC - Italy) to manage the entire CMCC data production activity as well as in the international Climate-G testbed.
CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises
NASA Astrophysics Data System (ADS)
Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.
2011-12-01
JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web-based interface by a metadata editor in CMO as needed. Then daily differential uptake of metadata from the XML database to databases in several distribution websites is automatically processed using a convertor defined by the EAI software. Currently, CMO is available for three distribution websites: "Deep Sea Floor Rock Sample Database GANSEKI", "Marine Biological Sample Database", and "JAMSTEC E-library of Deep-sea Images". CMO is planned to provide "JAMSTEC Data Site for Research Cruises" with metadata in the future.
Improving Scientific Metadata Interoperability And Data Discoverability using OAI-PMH
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James M.; Wilson, Bruce E.
2010-12-01
While general-purpose search engines (such as Google or Bing) are useful for finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. However, there are a number of different protocols for harvesting metadata, with some challenges for ensuring that updates are propagated and for collaborations with repositories using differing metadata standards. The Open Archive Initiative Protocol for Metadata Handling (OAI-PMH) is a standard that is seeing increased use as a means for exchanging structured metadata. OAI-PMH implementations must support Dublin Core as a metadata standard, with other metadata formats as optional. We have developed tools which enable our structured search tool (Mercury; http://mercury.ornl.gov) to consume metadata from OAI-PMH services in any of the metadata formats we support (Dublin Core, Darwin Core, FCDC CSDGM, GCMD DIF, EML, and ISO 19115/19137). We are also making ORNL DAAC metadata available through OAI-PMH for other metadata tools to utilize, such as the NASA Global Change Master Directory, GCMD). This paper describes Mercury capabilities with multiple metadata formats, in general, and, more specifically, the results of our OAI-PMH implementations and the lessons learned. References: [1] R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. [2] R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010). [3] Devarakonda, R.; Palanisamy, G.; Green, J.; Wilson, B. E. "Mercury: An Example of Effective Software Reuse for Metadata Management Data Discovery and Access", Eos Trans. AGU, 89(53), Fall Meet. Suppl., IN11A-1019 (2008).
THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Shrestha, Biva; Palanisamy, Giri
Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, andmore » locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register their datasets to the NGEE data archive and allows the NGEE archive to publish these datasets via a data search portal (http://ngee.ornl.gov/data). These highly descriptive metadata created using OME allows the Archive to enable advanced data search options using keyword, geo-spatial, temporal and ontology filters. Similarly, ARM OME allows scientists or principal investigators (PIs) to submit their data products to the ARM data archive. How would OME help Big Data Centers like the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)? The ORNL DAAC is one of NASA s Earth Observing System Data and Information System (EOSDIS) data centers managed by the Earth Science Data and Information System (ESDIS) Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological, geological, and chemical components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the NGEE and ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability. Useful Links: USGS OME: http://mercury.ornl.gov/OME/ NGEE OME: http://ngee-arctic.ornl.gov/ngeemetadata/ ARM OME: http://archive2.ornl.gov/armome/ Contact: Ranjeet Devarakonda (devarakondar@ornl.gov) References: [1] Federal Geographic Data Committee. Content standard for digital geospatial metadata. Federal Geographic Data Committee, 1998. [2] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [3] Wilson, B. E., Palanisamy, G., Devarakonda, R., Rhyne, B. T., Lindsley, C., & Green, J. (2010). Mercury Toolset for Spatiotemporal Metadata. [4] Pouchard, L. C., Branstetter, M. L., Cook, R. B., Devarakonda, R., Green, J., Palanisamy, G., ... & Noy, N. F. (2013). A Linked Science investigation: enhancing climate change data discovery with semantic technologies. Earth science informatics, 6(3), 175-185.« less
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
Sensor metadata blueprints and computer-aided editing for disciplined SensorML
NASA Astrophysics Data System (ADS)
Tagliolato, Paolo; Oggioni, Alessandro; Fugazza, Cristiano; Pepe, Monica; Carrara, Paola
2016-04-01
The need for continuous, accurate, and comprehensive environmental knowledge has led to an increase in sensor observation systems and networks. The Sensor Web Enablement (SWE) initiative has been promoted by the Open Geospatial Consortium (OGC) to foster interoperability among sensor systems. The provision of metadata according to the prescribed SensorML schema is a key component for achieving this and nevertheless availability of correct and exhaustive metadata cannot be taken for granted. On the one hand, it is awkward for users to provide sensor metadata because of the lack in user-oriented, dedicated tools. On the other, the specification of invariant information for a given sensor category or model (e.g., observed properties and units of measurement, manufacturer information, etc.), can be labor- and timeconsuming. Moreover, the provision of these details is error prone and subjective, i.e., may differ greatly across distinct descriptions for the same system. We provide a user-friendly, template-driven metadata authoring tool composed of a backend web service and an HTML5/javascript client. This results in a form-based user interface that conceals the high complexity of the underlying format. This tool also allows for plugging in external data sources providing authoritative definitions for the aforementioned invariant information. Leveraging these functionalities, we compiled a set of SensorML profiles, that is, sensor metadata blueprints allowing end users to focus only on the metadata items that are related to their specific deployment. The natural extension of this scenario is the involvement of end users and sensor manufacturers in the crowd-sourced evolution of this collection of prototypes. We describe the components and workflow of our framework for computer-aided management of sensor metadata.
Uciteli, Alexandr; Herre, Heinrich
2015-01-01
The specification of metadata in clinical and epidemiological study projects absorbs significant expense. The validity and quality of the collected data depend heavily on the precise and semantical correct representation of their metadata. In various research organizations, which are planning and coordinating studies, the required metadata are specified differently, depending on many conditions, e.g., on the used study management software. The latter does not always meet the needs of a particular research organization, e.g., with respect to the relevant metadata attributes and structuring possibilities. The objective of the research, set forth in this paper, is the development of a new approach for ontology-based representation and management of metadata. The basic features of this approach are demonstrated by the software tool OntoStudyEdit (OSE). The OSE is designed and developed according to the three ontology method. This method for developing software is based on the interactions of three different kinds of ontologies: a task ontology, a domain ontology and a top-level ontology. The OSE can be easily adapted to different requirements, and it supports an ontologically founded representation and efficient management of metadata. The metadata specifications can by imported from various sources; they can be edited with the OSE, and they can be exported in/to several formats, which are used, e.g., by different study management software. Advantages of this approach are the adaptability of the OSE by integrating suitable domain ontologies, the ontological specification of mappings between the import/export formats and the DO, the specification of the study metadata in a uniform manner and its reuse in different research projects, and an intuitive data entry for non-expert users.
Report on the Global Data Assembly Center (GDAC) to the 12th GHRSST Science Team Meeting
NASA Technical Reports Server (NTRS)
Armstrong, Edward M.; Bingham, Andrew; Vazquez, Jorge; Thompson, Charles; Huang, Thomas; Finch, Chris
2011-01-01
In 2010/2011 the Global Data Assembly Center (GDAC) at NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC) continued its role as the primary clearinghouse and access node for operational Group for High Resolution Sea Surface Temperature (GHRSST) datastreams, as well as its collaborative role with the NOAA Long Term Stewardship and Reanalysis Facility (LTSRF) for archiving. Here we report on our data management activities and infrastructure improvements since the last science team meeting in June 2010.These include the implementation of all GHRSST datastreams in the new PO.DAAC Data Management and Archive System (DMAS) for more reliable and timely data access. GHRSST dataset metadata are now stored in a new database that has made the maintenance and quality improvement of metadata fields more straightforward. A content management system for a revised suite of PO.DAAC web pages allows dynamic access to a subset of these metadata fields for enhanced dataset description as well as discovery through a faceted search mechanism from the perspective of the user. From the discovery and metadata standpoint the GDAC has also implemented the NASA version of the OpenSearch protocol for searching for GHRSST granules and developed a web service to generate ISO 19115-2 compliant metadata records. Furthermore, the GDAC has continued to implement a new suite of tools and services for GHRSST datastreams including a Level 2 subsetter known as Dataminer, a revised POET Level 3/4 subsetter and visualization tool, a Google Earth interface to selected daily global Level 2 and Level 4 data, and experimented with a THREDDS catalog of GHRSST data collections. Finally we will summarize the expanding user and data statistics, and other metrics that we have collected over the last year demonstrating the broad user community and applications that the GHRSST project continues to serve via the GDAC distribution mechanisms. This report also serves by extension to summarize the activities of the GHRSST Data Assembly and Systems Technical Advisory Group (DAS-TAG).
TR32DB - Management of Research Data in a Collaborative, Interdisciplinary Research Project
NASA Astrophysics Data System (ADS)
Curdt, Constanze; Hoffmeister, Dirk; Waldhoff, Guido; Lang, Ulrich; Bareth, Georg
2015-04-01
The management of research data in a well-structured and documented manner is essential in the context of collaborative, interdisciplinary research environments (e.g. across various institutions). Consequently, set-up and use of a research data management (RDM) system like a data repository or project database is necessary. These systems should accompany and support scientists during the entire research life cycle (e.g. data collection, documentation, storage, archiving, sharing, publishing) and operate cross-disciplinary in interdisciplinary research projects. Challenges and problems of RDM are well-know. Consequently, the set-up of a user-friendly, well-documented, sustainable RDM system is essential, as well as user support and further assistance. In the framework of the Transregio Collaborative Research Centre 32 'Patterns in Soil-Vegetation-Atmosphere Systems: Monitoring, Modelling, and Data Assimilation' (CRC/TR32), funded by the German Research Foundation (DFG), a RDM system was self-designed and implemented. The CRC/TR32 project database (TR32DB, www.tr32db.de) is operating online since early 2008. The TR32DB handles all data, which are created by the involved project participants from several institutions (e.g. Universities of Cologne, Bonn, Aachen, and the Research Centre Jülich) and research fields (e.g. soil and plant sciences, hydrology, geography, geophysics, meteorology, remote sensing). Very heterogeneous research data are considered, which are resulting from field measurement campaigns, meteorological monitoring, remote sensing, laboratory studies and modelling approaches. Furthermore, outcomes like publications, conference contributions, PhD reports and corresponding images are regarded. The TR32DB project database is set-up in cooperation with the Regional Computing Centre of the University of Cologne (RRZK) and also located in this hardware environment. The TR32DB system architecture is composed of three main components: (i) a file-based data storage including backup, (ii) a database-based storage for administrative data and metadata, and (iii) a web-interface for user access. The TR32DB offers common features of RDM systems. These include data storage, entry of corresponding metadata by a user-friendly input wizard, search and download of data depending on user permission, as well as secure internal exchange of data. In addition, a Digital Object Identifier (DOI) can be allocated for specific datasets and several web mapping components are supported (e.g. Web-GIS and map search). The centrepiece of the TR32DB is the self-provided and implemented CRC/TR32 specific metadata schema. This enables the documentation of all involved, heterogeneous data with accurate, interoperable metadata. The TR32DB Metadata Schema is set-up in a multi-level approach and supports several metadata standards and schemes (e.g. Dublin Core, ISO 19115, INSPIRE, DataCite). Furthermore, metadata properties with focus on the CRC/TR32 background (e.g. CRC/TR32 specific keywords) and the supported data types are complemented. Mandatory, optional and automatic metadata properties are specified. Overall, the TR32DB is designed and implemented according to the needs of the CRC/TR32 (e.g. huge amount of heterogeneous data) and demands of the DFG (e.g. cooperation with a computing centre). The application of a self-designed, project-specific, interoperable metadata schema enables the accurate documentation of all CRC/TR32 data. The implementation of the TR32DB in the hardware environment of the RRZK ensures the access to the data after the end of the CRC/TR32 funding in 2018.
An image database management system for conducting CAD research
NASA Astrophysics Data System (ADS)
Gruszauskas, Nicholas; Drukker, Karen; Giger, Maryellen L.
2007-03-01
The development of image databases for CAD research is not a trivial task. The collection and management of images and their related metadata from multiple sources is a time-consuming but necessary process. By standardizing and centralizing the methods in which these data are maintained, one can generate subsets of a larger database that match the specific criteria needed for a particular research project in a quick and efficient manner. A research-oriented management system of this type is highly desirable in a multi-modality CAD research environment. An online, webbased database system for the storage and management of research-specific medical image metadata was designed for use with four modalities of breast imaging: screen-film mammography, full-field digital mammography, breast ultrasound and breast MRI. The system was designed to consolidate data from multiple clinical sources and provide the user with the ability to anonymize the data. Input concerning the type of data to be stored as well as desired searchable parameters was solicited from researchers in each modality. The backbone of the database was created using MySQL. A robust and easy-to-use interface for entering, removing, modifying and searching information in the database was created using HTML and PHP. This standardized system can be accessed using any modern web-browsing software and is fundamental for our various research projects on computer-aided detection, diagnosis, cancer risk assessment, multimodality lesion assessment, and prognosis. Our CAD database system stores large amounts of research-related metadata and successfully generates subsets of cases that match the user's desired search criteria.
New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and UrbIS
NASA Astrophysics Data System (ADS)
Crow, M. C.; Devarakonda, R.; Hook, L.; Killeffer, T.; Krassovski, M.; Boden, T.; King, A. W.; Wullschleger, S. D.
2016-12-01
Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This discussion describes tools being used in two different projects at Oak Ridge National Laboratory (ORNL), but at different stages of the data lifecycle. The Metadata Entry and Data Search Tool is being used for the documentation, archival, and data discovery stages for the Next Generation Ecosystem Experiment - Arctic (NGEE Arctic) project while the Urban Information Systems (UrbIS) Data Catalog is being used to support indexing, cataloging, and searching. The NGEE Arctic Online Metadata Entry Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The UrbIS Data Catalog is a data discovery tool supported by the Mercury cataloging framework [2] which aims to compile urban environmental data from around the world into one location, and be searchable via a user-friendly interface. Each data record conveniently displays its title, source, and date range, and features: (1) a button for a quick view of the metadata, (2) a direct link to the data and, for some data sets, (3) a button for visualizing the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for searching by area. References: [1] Devarakonda, Ranjeet, et al. "Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. [2] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94.
NASA Astrophysics Data System (ADS)
Mitchell, A. E.; Lowe, D. R.; Murphy, K. J.; Ramapriyan, H. K.
2011-12-01
Initiated in 1990, NASA's Earth Observing System Data and Information System (EOSDIS) is currently a petabyte-scale archive of data designed to receive, process, distribute and archive several terabytes of science data per day from NASA's Earth science missions. Comprised of 12 discipline specific data centers collocated with centers of science discipline expertise, EOSDIS manages over 6800 data products from many science disciplines and sources. NASA supports global climate change research by providing scalable open application layers to the EOSDIS distributed information framework. This allows many other value-added services to access NASA's vast Earth Science Collection and allows EOSDIS to interoperate with data archives from other domestic and international organizations. EOSDIS is committed to NASA's Data Policy of full and open sharing of Earth science data. As metadata is used in all aspects of NASA's Earth science data lifecycle, EOSDIS provides a spatial and temporal metadata registry and order broker called the EOS Clearing House (ECHO) that allows efficient search and access of cross domain data and services through the Reverb Client and Application Programmer Interfaces (APIs). Another core metadata component of EOSDIS is NASA's Global Change Master Directory (GCMD) which represents more than 25,000 Earth science data set and service descriptions from all over the world, covering subject areas within the Earth and environmental sciences. With inputs from the ECHO, GCMD and Soil Moisture Active Passive (SMAP) mission metadata models, EOSDIS is developing a NASA ISO 19115 Best Practices Convention. Adoption of an international metadata standard enables a far greater level of interoperability among national and international data products. NASA recently concluded a 'Metadata Harmony Study' of EOSDIS metadata capabilities/processes of ECHO and NASA's Global Change Master Directory (GCMD), to evaluate opportunities for improved data access and use, reduce efforts by data providers and improve metadata integrity. The result was a recommendation for EOSDIS to develop a 'Common Metadata Repository (CMR)' to manage the evolution of NASA Earth Science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future capabilities. For applications users interested in monitoring and analyzing a wide variety of natural and man-made phenomena, EOSDIS provides access to near real-time products from the MODIS, OMI, AIRS, and MLS instruments in less than 3 hours from observation. To enable interactive exploration of NASA's Earth imagery, EOSDIS is developing a set of standard services to deliver global, full-resolution satellite imagery in a highly responsive manner. EOSDIS is also playing a lead role in the development of the CEOS WGISS Integrated Catalog (CWIC), which provides search and access to holdings of participating international data providers. EOSDIS provides a platform to expose and share information on NASA Earth science tools and data via Earthdata.nasa.gov while offering a coherent and interoperable system for the NASA Earth Science Data System (ESDS) Program.
NASA Astrophysics Data System (ADS)
Mitchell, A. E.; Lowe, D. R.; Murphy, K. J.; Ramapriyan, H. K.
2013-12-01
Initiated in 1990, NASA's Earth Observing System Data and Information System (EOSDIS) is currently a petabyte-scale archive of data designed to receive, process, distribute and archive several terabytes of science data per day from NASA's Earth science missions. Comprised of 12 discipline specific data centers collocated with centers of science discipline expertise, EOSDIS manages over 6800 data products from many science disciplines and sources. NASA supports global climate change research by providing scalable open application layers to the EOSDIS distributed information framework. This allows many other value-added services to access NASA's vast Earth Science Collection and allows EOSDIS to interoperate with data archives from other domestic and international organizations. EOSDIS is committed to NASA's Data Policy of full and open sharing of Earth science data. As metadata is used in all aspects of NASA's Earth science data lifecycle, EOSDIS provides a spatial and temporal metadata registry and order broker called the EOS Clearing House (ECHO) that allows efficient search and access of cross domain data and services through the Reverb Client and Application Programmer Interfaces (APIs). Another core metadata component of EOSDIS is NASA's Global Change Master Directory (GCMD) which represents more than 25,000 Earth science data set and service descriptions from all over the world, covering subject areas within the Earth and environmental sciences. With inputs from the ECHO, GCMD and Soil Moisture Active Passive (SMAP) mission metadata models, EOSDIS is developing a NASA ISO 19115 Best Practices Convention. Adoption of an international metadata standard enables a far greater level of interoperability among national and international data products. NASA recently concluded a 'Metadata Harmony Study' of EOSDIS metadata capabilities/processes of ECHO and NASA's Global Change Master Directory (GCMD), to evaluate opportunities for improved data access and use, reduce efforts by data providers and improve metadata integrity. The result was a recommendation for EOSDIS to develop a 'Common Metadata Repository (CMR)' to manage the evolution of NASA Earth Science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future capabilities. For applications users interested in monitoring and analyzing a wide variety of natural and man-made phenomena, EOSDIS provides access to near real-time products from the MODIS, OMI, AIRS, and MLS instruments in less than 3 hours from observation. To enable interactive exploration of NASA's Earth imagery, EOSDIS is developing a set of standard services to deliver global, full-resolution satellite imagery in a highly responsive manner. EOSDIS is also playing a lead role in the development of the CEOS WGISS Integrated Catalog (CWIC), which provides search and access to holdings of participating international data providers. EOSDIS provides a platform to expose and share information on NASA Earth science tools and data via Earthdata.nasa.gov while offering a coherent and interoperable system for the NASA Earth Science Data System (ESDS) Program.
Creating preservation metadata from XML-metadata profiles
NASA Astrophysics Data System (ADS)
Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate
2014-05-01
Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the Metadata Encoding and Transmission Standard (METS). To find datasets in future portals and to make use of this data in own scientific work, proper selection of discovery metadata and application metadata is very important. Some XML-metadata profiles are not suitable for preservation, because version changes are very fast and make it nearly impossible to automate the migration. For other XML-metadata profiles schema definitions are changed after publication of the profile or the schema definitions become inaccessible, which might cause problems during validation of the metadata inside the preservation system [2]. Some metadata profiles are not used widely enough and might not even exist in the future. Eventually, discovery and application metadata have to be embedded into the mdWrap-subtree of the METS-XML. [1] http://www.archivematica.org [2] http://dx.doi.org/10.2218/ijdc.v7i1.215
Survey data and metadata modelling using document-oriented NoSQL
NASA Astrophysics Data System (ADS)
Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.
2018-03-01
Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.
NASA Astrophysics Data System (ADS)
Klump, J. F.; Ulbricht, D.; Conze, R.
2014-12-01
The Continental Deep Drilling Programme (KTB) was a scientific drilling project from 1987 to 1995 near Windischeschenbach, Bavaria. The main super-deep borehole reached a depth of 9,101 meters into the Earth's continental crust. The project used the most current equipment for data capture and processing. After the end of the project key data were disseminated through the web portal of the International Continental Scientific Drilling Program (ICDP). The scientific reports were published as printed volumes. As similar projects have also experienced, it becomes increasingly difficult to maintain a data portal over a long time. Changes in software and underlying hardware make a migration of the entire system inevitable. Around 2009 the data presented on the ICDP web portal were migrated to the Scientific Drilling Database (SDDB) and published through DataCite using Digital Object Identifiers (DOI) as persistent identifiers. The SDDB portal used a relational database with a complex data model to store data and metadata. A PHP-based Content Management System with custom modifications made it possible to navigate and browse datasets using the metadata and then download datasets. The data repository software eSciDoc allows storing self-contained packages consistent with the OAIS reference model. Each package consists of binary data files and XML-metadata. Using a REST-API the packages can be stored in the eSciDoc repository and can be searched using the XML-metadata. During the last maintenance cycle of the SDDB the data and metadata were migrated into the eSciDoc repository. Discovery metadata was generated following the GCMD-DIF, ISO19115 and DataCite schemas. The eSciDoc repository allows to store an arbitrary number of XML-metadata records with each data object. In addition to descriptive metadata each data object may contain pointers to related materials, such as IGSN-metadata to link datasets to physical specimens, or identifiers of literature interpreting the data. Datasets are presented by XSLT-stylesheet transformation using the stored metadata. The presentation shows several migration cycles of data and metadata, which were driven by aging software systems. Currently the datasets reside as self-contained entities in a repository system that is ready for digital preservation.
The Challenges in Metadata Management: 20+ Years of ESO Data
NASA Astrophysics Data System (ADS)
Vera, I.; Da Rocha, C.; Dobrzycki, A.; Micol, A.; Vuong, M.
2015-09-01
The European Southern Observatory Science Archive Facility has been in operations for more than 20 years. It contains data produced by ESO telescopes as well as the metadata needed for characterizing and distributing those data. This metadata is used to build the different archive services provided by the Archive. Over these years, services have been added, modified or even decommissioned creating a cocktail of new, evolved and legacy data systems. The challenge for the Archive is to harmonize the differences of those data systems to provide the community with a homogeneous experience when using ESO data. In this paper, we present ESO experience in three particular challenging areas. First discussion is dedicated to the problem of metadata quality over the time, second discusses how to integrate obsolete data models on the current services and finally we will present the challenges of ever growing databases. We describe our experience dealing with those issues and the solutions adopted to mitigate them.
openPDS: protecting the privacy of metadata through SafeAnswers.
de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S; Pentland, Alex Sandy
2014-01-01
The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.
openPDS: Protecting the Privacy of Metadata through SafeAnswers
de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S.; Pentland, Alex Sandy
2014-01-01
The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research. PMID:25007320
Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian
2017-06-05
Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.
Discovering Physical Samples Through Identifiers, Metadata, and Brokering
NASA Astrophysics Data System (ADS)
Arctur, D. K.; Hills, D. J.; Jenkyns, R.
2015-12-01
Physical samples, particularly in the geosciences, are key to understanding the Earth system, its history, and its evolution. Our record of the Earth as captured by physical samples is difficult to explain and mine for understanding, due to incomplete, disconnected, and evolving metadata content. This is further complicated by differing ways of classifying, cataloguing, publishing, and searching the metadata, especially when specimens do not fit neatly into a single domain—for example, fossils cross disciplinary boundaries (mineral and biological). Sometimes even the fundamental classification systems evolve, such as the geological time scale, triggering daunting processes to update existing specimen databases. Increasingly, we need to consider ways of leveraging permanent, unique identifiers, as well as advancements in metadata publishing that link digital records with physical samples in a robust, adaptive way. An NSF EarthCube Research Coordination Network (RCN) called the Internet of Samples (iSamples) is now working to bridge the metadata schemas for biological and geological domains. We are leveraging the International Geo Sample Number (IGSN) that provides a versatile system of registering physical samples, and working to harmonize this with the DataCite schema for Digital Object Identifiers (DOI). A brokering approach for linking disparate catalogues and classification systems could help scale discovery and access to the many large collections now being managed (sometimes millions of specimens per collection). This presentation is about our community building efforts, research directions, and insights to date.
Mercury: An Example of Effective Software Reuse for Metadata Management, Data Discovery and Access
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.
2008-12-01
Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. Though originally developed for NASA, the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the 12 projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects. To balance these common and project-specific needs, Mercury's architecture has three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of project specific configuration files. The harvested files are structured metadata records that are indexed against the search library API consistently, so that it can render various search capabilities such as simple, fielded, spatial and temporal. This backend component is supported by a very flexible, easy to use Graphical User Interface which is driven by cascading style sheets, which make it even simpler for reusable design implementation. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book- markable search results, save, retrieve, and modify search criteria.
Mercury: An Example of Effective Software Reuse for Metadata Management, Data Discovery and Access
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet
2008-01-01
Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. Though originally developed for NASA, the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfacesmore » then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the 12 projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects. To balance these common and project-specific needs, Mercury's architecture has three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of project specific configuration files. The harvested files are structured metadata records that are indexed against the search library API consistently, so that it can render various search capabilities such as simple, fielded, spatial and temporal. This backend component is supported by a very flexible, easy to use Graphical User Interface which is driven by cascading style sheets, which make it even simpler for reusable design implementation. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book- markable search results, save, retrieve, and modify search criteria.« less
ESDORA: A Data Archive Infrastructure Using Digital Object Model and Open Source Frameworks
NASA Astrophysics Data System (ADS)
Shrestha, Biva; Pan, Jerry; Green, Jim; Palanisamy, Giriprakash; Wei, Yaxing; Lenhardt, W.; Cook, R. Bob; Wilson, B. E.; Leggott, M.
2011-12-01
There are an array of challenges associated with preserving, managing, and using contemporary scientific data. Large volume, multiple formats and data services, and the lack of a coherent mechanism for metadata/data management are some of the common issues across data centers. It is often difficult to preserve the data history and lineage information, along with other descriptive metadata, hindering the true science value for the archived data products. In this project, we use digital object abstraction architecture as the information/knowledge framework to address these challenges. We have used the following open-source frameworks: Fedora-Commons Repository, Drupal Content Management System, Islandora (Drupal Module) and Apache Solr Search Engine. The system is an active archive infrastructure for Earth Science data resources, which include ingestion, archiving, distribution, and discovery functionalities. We use an ingestion workflow to ingest the data and metadata, where many different aspects of data descriptions (including structured and non-structured metadata) are reviewed. The data and metadata are published after reviewing multiple times. They are staged during the reviewing phase. Each digital object is encoded in XML for long-term preservation of the content and relations among the digital items. The software architecture provides a flexible, modularized framework for adding pluggable user-oriented functionality. Solr is used to enable word search as well as faceted search. A home grown spatial search module is plugged in to allow user to make a spatial selection in a map view. A RDF semantic store within the Fedora-Commons Repository is used for storing information on data lineage, dissemination services, and text-based metadata. We use the semantic notion "isViewerFor" to register internally or externally referenced URLs, which are rendered within the same web browser when possible. With appropriate mapping of content into digital objects, many different data descriptions, including structured metadata, data history, auditing trails, are captured and coupled with the data content. The semantic store provides a foundation for possible further utilizations, including provide full-fledged Earth Science ontology for data interpretation or lineage tracking. Datasets from the NASA-sponsored Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) as well as from the Synthesis Thematic Data Center (MAST-DC) are used in a testing deployment with the system. The testing deployment allows us to validate the features and values described here for the integrated system, which will be presented here. Overall, we believe that the integrated system is valid, reusable data archive software that provides digital stewardship for Earth Sciences data content, now and in the future. References: [1] Devarakonda, Ranjeet, and Harold Shanafield. "Drupal: Collaborative framework for science research." Collaboration Technologies and Systems (CTS), 2011 International Conference on. IEEE, 2011. [2] Devarakonda, Ranjeet, et al. "Semantic search integration to climate data." Collaboration Technologies and Systems (CTS), 2014 International Conference on. IEEE, 2014.
Mercury Toolset for Spatiotemporal Metadata
NASA Technical Reports Server (NTRS)
Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James
2010-01-01
Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Mercury Toolset for Spatiotemporal Metadata
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris
2010-06-01
Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center
NASA Astrophysics Data System (ADS)
Moroni, D. F.; Quach, N.; Francis-Curley, W.
2016-12-01
As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to facilitate rapid and efficient updating of dataset metadata records by external data producers. Here we present this new service and demonstrate the variety of ways in which a multitude of Earth Science datasets may be submitted in a manner that significantly reduces the time in ensuring that new, vital data reaches the public domain.
Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.
Caffery, Liam J; Clunie, David; Curiel-Lewandrowski, Clara; Malvehy, Josep; Soyer, H Peter; Halpern, Allan C
2018-01-17
Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.
ISO, FGDC, DIF and Dublin Core - Making Sense of Metadata Standards for Earth Science Data
NASA Astrophysics Data System (ADS)
Jones, P. R.; Ritchey, N. A.; Peng, G.; Toner, V. A.; Brown, H.
2014-12-01
Metadata standards provide common definitions of metadata fields for information exchange across user communities. Despite the broad adoption of metadata standards for Earth science data, there are still heterogeneous and incompatible representations of information due to differences between the many standards in use and how each standard is applied. Federal agencies are required to manage and publish metadata in different metadata standards and formats for various data catalogs. In 2014, the NOAA National Climatic data Center (NCDC) managed metadata for its scientific datasets in ISO 19115-2 in XML, GCMD Directory Interchange Format (DIF) in XML, DataCite Schema in XML, Dublin Core in XML, and Data Catalog Vocabulary (DCAT) in JSON, with more standards and profiles of standards planned. Of these standards, the ISO 19115-series metadata is the most complete and feature-rich, and for this reason it is used by NCDC as the source for the other metadata standards. We will discuss the capabilities of metadata standards and how these standards are being implemented to document datasets. Successful implementations include developing translations and displays using XSLTs, creating links to related data and resources, documenting dataset lineage, and establishing best practices. Benefits, gaps, and challenges will be highlighted with suggestions for improved approaches to metadata storage and maintenance.
Content-aware network storage system supporting metadata retrieval
NASA Astrophysics Data System (ADS)
Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun
2008-12-01
Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.
Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges
NASA Astrophysics Data System (ADS)
Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.
2013-12-01
Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM program also provides a standards based online metadata editor (OME) for PIs to submit their data to the ARM Data Archive. USGS CSAS metadata Clearinghouse aggregates metadata records from several USGS projects and other partner organizations. The Clearinghouse allows users to search and discover over 100,000 biological and ecological datasets from a single web portal. The Clearinghouse also enabled some new data discovery functions such as enhanced geo-spatial searches based on land and ocean classifications, metadata completeness rankings, data linkage via digital object identifiers (DOIs), and semantically enhanced keyword searches. The Clearinghouse also currently working on enabling a dashboard which allows the data providers to look at various statistics such as number their records accessed via the Clearinghouse, most popular keywords, metadata quality report and DOI creation service. The Clearinghouse also publishes metadata records to broader portals such as NSF DataONE and Data.gov. The author will also present how these capabilities are currently reused by the recent and upcoming data centers such as DOE's NGEE-Arctic project. References: [1] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94. [2]Devarakonda, R., Shrestha, B., Palanisamy, G., Hook, L., Killeffer, T., Krassovski, M., ... & Frame, M. (2014, October). OME: Tool for generating and managing metadata to handle BigData. In BigData Conference (pp. 8-10).
Interoperability Gap Challenges for Learning Object Repositories & Learning Management Systems
ERIC Educational Resources Information Center
Mason, Robert T.
2011-01-01
An interoperability gap exists between Learning Management Systems (LMSs) and Learning Object Repositories (LORs). Learning Objects (LOs) and the associated Learning Object Metadata (LOM) that is stored within LORs adhere to a variety of LOM standards. A common LOM standard found in LORs is the Sharable Content Object Reference Model (SCORM)…
Data System Architectures: Recent Experiences from Data Intensive Projects
NASA Astrophysics Data System (ADS)
Palanisamy, G.; Frame, M. T.; Boden, T.; Devarakonda, R.; Zolly, L.; Hutchison, V.; Latysh, N.; Krassovski, M.; Killeffer, T.; Hook, L.
2014-12-01
U.S. Federal agencies are frequently trying to address new data intensive projects that require next generation of data system architectures. This presentation will focus on two new such architectures: USGS's Science Data Catalog (SDC) and DOE's Next Generation Ecological Experiments - Arctic Data System. The U.S. Geological Survey (USGS) developed a Science Data Catalog (data.usgs.gov) to include records describing datasets, data collections, and observational or remotely-sensed data. The system was built using service oriented architecture and allows USGS scientists and data providers to create and register their data using either a standards-based metadata creation form or simply to register their already-created metadata records with the USGS SDC Dashboard. This dashboard then compiles the harvested metadata records and sends them to the post processing and indexing service using the JSON format. The post processing service, with the help of various ontologies and other geo-spatial validation services, auto-enhances these harvested metadata records and creates a Lucene index using the Solr enterprise search platform. Ultimately, metadata is made available via the SDC search interface. DOE's Next Generation Ecological Experiments (NGEE) Arctic project deployed a data system that allows scientists to prepare, publish, archive, and distribute data from field collections, lab experiments, sensors, and simulated modal outputs. This architecture includes a metadata registration form, data uploading and sharing tool, a Digital Object Identifier (DOI) tool, a Drupal based content management tool (http://ngee-arctic.ornl.gov), and a data search and access tool based on ORNL's Mercury software (http://mercury.ornl.gov). The team also developed Web-metric tools and a data ingest service to visualize geo-spatial and temporal observations.
OpenFlow arbitrated programmable network channels for managing quantum metadata
Dasari, Venkat R.; Humble, Travis S.
2016-10-10
Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
OpenFlow arbitrated programmable network channels for managing quantum metadata
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dasari, Venkat R.; Humble, Travis S.
Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.
A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata requestmore » can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.« less
Region 7 Laboratory Information Management System
This is metadata documentation for the Region 7 Laboratory Information Management System (R7LIMS) which maintains records for the Regional Laboratory. Any Laboratory analytical work performed is stored in this system which replaces LIMS-Lite, and before that LAST. The EPA and its contractors may use this database. The Office of Policy & Management (PLMG) Division at EPA Region 7 is the primary managing entity; contractors can access this database but it is not accessible to the public.
Atmospheric Science Data Center
2012-11-30
Information Management System An online user interface which provides data and metadata ... on a 24-hour basis; accepts user orders for data; provides information about future data acquision and processing schedules and maintains ...
NASA Astrophysics Data System (ADS)
Yen, Y. N.; Weng, K. H.; Huang, H. Y.
2013-07-01
After over 30 years of practise and development, Taiwan's architectural conservation field is moving rapidly into digitalization and its applications. Compared to modern buildings, traditional Chinese architecture has considerably more complex elements and forms. To document and digitize these unique heritages in their conservation lifecycle is a new and important issue. This article takes the caisson ceiling of the Taipei Confucius Temple, octagonal with 333 elements in 8 types, as a case study for digitization practise. The application of metadata representation and 3D modelling are the two key issues to discuss. Both Revit and SketchUp were appliedin this research to compare its effectiveness to metadata representation. Due to limitation of the Revit database, the final 3D models wasbuilt with SketchUp. The research found that, firstly, cultural heritage databasesmustconvey that while many elements are similar in appearance, they are unique in value; although 3D simulations help the general understanding of architectural heritage, software such as Revit and SketchUp, at this stage, could onlybe used tomodel basic visual representations, and is ineffective indocumenting additional critical data ofindividually unique elements. Secondly, when establishing conservation lifecycle information for application in management systems, a full and detailed presentation of the metadata must also be implemented; the existing applications of BIM in managing conservation lifecycles are still insufficient. Results of the research recommends SketchUp as a tool for present modelling needs, and BIM for sharing data between users, but the implementation of metadata representation is of the utmost importance.
The ATLAS Eventlndex: data flow and inclusion of other metadata
NASA Astrophysics Data System (ADS)
Barberis, D.; Cárdenas Zárate, S. E.; Favareto, A.; Fernandez Casani, A.; Gallas, E. J.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Hrivnac, J.; Malon, D.; Prokoshin, F.; Salt, J.; Sanchez, J.; Toebbicke, R.; Yuan, R.; ATLAS Collaboration
2016-10-01
The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the Grid jobs. EventIndex developments started in 2012 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.
An Approach to Information Management for AIR7000 with Metadata and Ontologies
2009-10-01
metadata. We then propose an approach based on Semantic Technologies including the Resource Description Framework (RDF) and Upper Ontologies, for the...mandating specific metadata schemas can result in interoperability problems. For example, many standards within the ADO mandate the use of XML for metadata...such problems, we propose an archi- tecture in which different metadata schemes can inter operate. By using RDF (Resource Description Framework ) as a
A digital repository with an extensible data model for biobanking and genomic analysis management.
Izzo, Massimiliano; Mortola, Francesco; Arnulfo, Gabriele; Fato, Marco M; Varesio, Luigi
2014-01-01
Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
A digital repository with an extensible data model for biobanking and genomic analysis management
2014-01-01
Motivation Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid. PMID:25077808
Compatibility Between Metadata Standards: Import Pipeline of CDISC ODM to the Samply.MDR.
Kock-Schoppenhauer, Ann-Kristin; Ulrich, Hannes; Wagen-Zink, Stefanie; Duhm-Harbeck, Petra; Ingenerf, Josef; Neuhaus, Philipp; Dugas, Martin; Bruland, Philipp
2018-01-01
The establishment of a digital healthcare system is a national and community task. The Federal Ministry of Education and Research in Germany is providing funding for consortia consisting of university hospitals among others participating in the "Medical Informatics Initiative". Exchange of medical data between research institutions necessitates a place where meta information for this data is made accessible. Within these consortia different metadata registry solutions were chosen. To promote interoperability between these solutions, we have examined whether the portal of Medical Data Models is eligible for managing and communicating metadata and relevant information across different data integration centres of the Medical Informatics Initiative and beyond. Apart from the MDM-portal, some ISO 11179-based systems such as Samply.MDR as well as openEHR-based solutions are going to be applyed. In this paper, we have focused on the creation of a mapping model between the CDISC ODM standard and the Samply.MDR import format. In summary, it can be stated that the mapping model is feasible and promote the exchangeability between different metadata registry approaches.
McMahon, Christiana; Denaxas, Spiros
2017-11-06
Informed consent is an important feature of longitudinal research studies as it enables the linking of the baseline participant information with administrative data. The lack of standardized models to capture consent elements can lead to substantial challenges. A structured approach to capturing consent-related metadata can address these. a) Explore the state-of-the-art for recording consent; b) Identify key elements of consent required for record linkage; and c) Create and evaluate a novel metadata management model to capture consent-related metadata. The main methodological components of our work were: a) a systematic literature review and qualitative analysis of consent forms; b) the development and evaluation of a novel metadata model. We qualitatively analyzed 61 manuscripts and 30 consent forms. We extracted data elements related to obtaining consent for linkage. We created a novel metadata management model for consent and evaluated it by comparison with the existing standards and by iteratively applying it to case studies. The developed model can facilitate the standardized recording of consent for linkage in longitudinal research studies and enable the linkage of external participant data. Furthermore, it can provide a structured way of recording consent-related metadata and facilitate the harmonization and streamlining of processes.
Designing Extensible Data Management for Ocean Observatories, Platforms, and Devices
NASA Astrophysics Data System (ADS)
Graybeal, J.; Gomes, K.; McCann, M.; Schlining, B.; Schramm, R.; Wilkin, D.
2002-12-01
The Monterey Bay Aquarium Research Institute (MBARI) has been collecting science data for 15 years from all kinds of oceanographic instruments and systems, and is building a next-generation observing system, the MBARI Ocean Observing System (MOOS). To meet the data management requirements of the MOOS, the Institute began developing a flexible, extensible data management solution, the Shore Side Data System (SSDS). This data management system must address a wide variety of oceanographic instruments and data sources, including instruments and platforms of the future. Our data management solution will address all elements of the data management challenge, from ingest (including suitable pre-definition of metadata) through to access and visualization. Key to its success will be ease of use, and automatic incorporation of new data streams and data sets. The data will be of many different forms, and come from many different types of instruments. Instruments will be designed for fixed locations (as with moorings), changing locations (drifters and AUVs), and cruise-based sampling. Data from airplanes, satellites, models, and external archives must also be considered. Providing an architecture which allows data from these varied sources to be automatically archived and processed, yet readily accessed, is only possible with the best practices in metadata definition, software design, and re-use of third-party components. The current status of SSDS development will be presented, including lessons learned from our science users and from previous data management designs.
MMI: Increasing Community Collaboration
NASA Astrophysics Data System (ADS)
Galbraith, N. R.; Stocks, K.; Neiswender, C.; Maffei, A.; Bermudez, L.
2007-12-01
Building community requires a collaborative environment and guidance to help move members towards a common goal. An effective environment for community collaboration is a workspace that fosters participation and cooperation; effective guidance furthers common understanding and promotes best practices. The Marine Metadata Interoperability (MMI) project has developed a community web site to provide a collaborative environment for scientists, technologists, and data managers from around the world to learn about metadata and exchange ideas. Workshops, demonstration projects, and presentations also provide community-building opportunities for MMI. MMI has developed comprehensive online guides to help users understand and work with metadata standards, ontologies, and other controlled vocabularies. Documents such as "The Importance of Metadata Standards", "Usage vs. Discovery Vocabularies" and "Developing Controlled Vocabularies" guide scientists and data managers through a variety of metadata-related concepts. Members from eight organizations involved in marine science and informatics collaborated on this effort. The MMI web site has moved from Plone to Drupal, two content management systems which provide different opportunities for community-based work. Drupal's "organic groups" feature will be used to provide workspace for future teams tasked with content development, outreach, and other MMI mission-critical work. The new site is designed to enable members to easily create working areas, to build communities dedicated to developing consensus on metadata and other interoperability issues. Controlled-vocabulary-driven menus, integrated mailing-lists, member-based content creation and review tools are facets of the new web site architecture. This move provided the challenge of developing a hierarchical vocabulary to describe the resources presented on the site; consistent and logical tagging of web pages is the basis of Drupal site navigation. The new MMI web site presents enhanced opportunities for electronic discussions, focused collaborative work, and even greater community participation. The MMI project is beginning a new initiative to comprehensively catalog and document tools for marine metadata. The new MMI community-based web site will be used to support this work and to support the work of other ad-hoc teams in the future. We are seeking broad input from the community on this effort.
Design and implementation of a fault-tolerant and dynamic metadata database for clinical trials
NASA Astrophysics Data System (ADS)
Lee, J.; Zhou, Z.; Talini, E.; Documet, J.; Liu, B.
2007-03-01
In recent imaging-based clinical trials, quantitative image analysis (QIA) and computer-aided diagnosis (CAD) methods are increasing in productivity due to higher resolution imaging capabilities. A radiology core doing clinical trials have been analyzing more treatment methods and there is a growing quantity of metadata that need to be stored and managed. These radiology centers are also collaborating with many off-site imaging field sites and need a way to communicate metadata between one another in a secure infrastructure. Our solution is to implement a data storage grid with a fault-tolerant and dynamic metadata database design to unify metadata from different clinical trial experiments and field sites. Although metadata from images follow the DICOM standard, clinical trials also produce metadata specific to regions-of-interest and quantitative image analysis. We have implemented a data access and integration (DAI) server layer where multiple field sites can access multiple metadata databases in the data grid through a single web-based grid service. The centralization of metadata database management simplifies the task of adding new databases into the grid and also decreases the risk of configuration errors seen in peer-to-peer grids. In this paper, we address the design and implementation of a data grid metadata storage that has fault-tolerance and dynamic integration for imaging-based clinical trials.
Li, Zhao; Li, Jin; Yu, Peng
2018-01-01
Abstract Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org. Database URL: https://bitbucket.com/yubiolab/GEOMetaCuration PMID:29688376
Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA
2006-12-19
A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.
NASA Astrophysics Data System (ADS)
Galbraith, N. R.; Graybeal, J.; Bermudez, L. E.; Wright, D.
2005-12-01
The Marine Metadata Interoperability (MMI) initiative promotes the exchange, integration and use of marine data through enhanced data publishing, discovery, documentation and accessibility. The project, operating since late 2004, presents several cultural organizational challenges because of the diversity of participants: scientists, technical experts, and data managers from around the world, all working in organizations with different corporate cultures, funding structures, and systems of decision-making. MMI provides educational resources at several levels. For instance, short introductions to metadata concepts are available, as well as guides and "cookbooks" for the quick and efficient preparation of marine metadata. For those who are building major marine data systems, including ocean-observing capabilities, there are training materials, marine metadata content examples, and resources for mapping elements between different metadata standards. The MMI also provides examples of good metadata practices in existing data systems, including the EU's Marine XML project, and functioning ocean/coastal clearinghouses and atlases developed by MMI team members. Communication tools that help build community: 1) Website, used to introduce the initiative to new visitors, and to provide in-depth guidance and resources to members and visitors. The site is built using Plone, an open source web content management system. Plone allows the site to serve as a wiki, to which every user can contribute material. This keeps the membership engaged and spreads the responsibility for the tasks of updating and expanding the site. 2) Email-lists, to engage the broad ocean sciences community. The discussion forums "news," "ask," and "site-help" are available for receiving regular updates on MMI activities, seeking advice or support on projects and standards, or for assistance with using the MMI site. Internal email lists are provided for the Technical Team, the Steering Committee and Executive Committee, and for several content-centered teams. These lists help keep committee members connected, and have been very successful in building consensus and momentum. 3) Regularly scheduled telecons, to provide the chance for interaction between members without the need to physically attend meetings. Both the steering committee and the technical team convene via phone every month. Discussions are guided by agendas published in advance, and minutes are kept on-line for reference. These telecons have been an important tool in moving the MMI project forward; they give members an opportunity for informal discussion and provide a timeframe for accomplishing tasks. 4) Workshops, to make progress towards community agreement, such as the technical workshop "Advancing Domain Vocabularies" August 9-11, 2005, in Boulder, Colorado, where featured domain and metadata experts developed mappings between existing marine metadata vocabularies. Most of the work of the meeting was performed in six small, carefully organized breakout teams, oriented around specific domains. 5) Calendar of events, to keep update the users and where any event related to marine metadata and interoperability can be posted. 6) Specific tools to reach agreements among distributed communities. For example, we developed a tool called Vocabulary Integration Environment (VINE), that allows formalized agreements of mappings across different vocabularies.
Omics Metadata Management Software (OMMS).
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
The Planetary Data System Distributed Inventory System
NASA Technical Reports Server (NTRS)
Hughes, J. Steven; McMahon, Susan K.
1996-01-01
The advent of the World Wide Web (Web) and the ability to easily put data repositories on-line has resulted in a proliferation of digital libraries. The heterogeneity of the underlying systems, the autonomy of the individual sites, and distributed nature of the technology has made both interoperability across the sites and the search for resources within a site major research topics. This article will describe a system that addresses both issues using standard Web protocols and meta-data labels to implement an inventory of on-line resources across a group of sites. The success of this system is strongly dependent on the existence of and adherence to a standards architecture that guides the management of meta-data within participating sites.
A Metadata Element Set for Project Documentation
NASA Technical Reports Server (NTRS)
Hodge, Gail; Templeton, Clay; Allen, Robert B.
2003-01-01
Abstract NASA Goddard Space Flight Center is a large engineering enterprise with many projects. We describe our efforts to develop standard metadata sets across project documentation which we term the "Goddard Core". We also address broader issues for project management metadata.
NASA Astrophysics Data System (ADS)
Schaap, D. M. A.; Maudire, G.
2009-04-01
SeaDataNet is an Integrated research Infrastructure Initiative (I3) in EU FP6 (2006 - 2011) to provide the data management system adapted both to the fragmented observation system and the users need for an integrated access to data, meta-data, products and services. Therefore SeaDataNet insures the long term archiving of the large number of multidisciplinary data (i.e. temperature, salinity current, sea level, chemical, physical and biological properties) collected by many different sensors installed on board of research vessels, satellite and the various platforms of the marine observing system. The SeaDataNet project started in 2006, but builds upon earlier data management infrastructure projects, undertaken over a period of 20 years by an expanding network of oceanographic data centres from the countries around all European seas. Its predecessor project Sea-Search had a strict focus on metadata. SeaDataNet maintains significant interest in the further development of the metadata infrastructure, but its primary objective is the provision of easy data access and generic data products. SeaDataNet is a distributed infrastructure that provides transnational access to marine data, meta-data, products and services through 40 interconnected Trans National Data Access Platforms (TAP) from 35 countries around the Black Sea, Mediterranean, North East Atlantic, North Sea, Baltic and Arctic regions. These include: National Oceanographic Data Centres (NODC's) Satellite Data Centres. Furthermore the SeaDataNet consortium comprises a number of expert modelling centres, SME's experts in IT, and 3 international bodies (ICES, IOC and JRC). Planning: The SeaDataNet project is delivering and operating the infrastructure in 3 versions: Version 0: maintenance and further development of the metadata systems developed by the Sea-Search project plus the development of a new metadata system for indexing and accessing to individual data objects managed by the SeaDataNet data centres. This is known as the Common Data Index (CDI) V0 system Version 1: harmonisation and upgrading of the metadatabases through adoption of the ISO 19115 metadata standard and provision of transparent data access and download services from all partner data centres through upgrading the Common Data Index and deployment of a data object delivery service. Version 2: adding data product services and OGC compliant viewing services and further virtualisation of data access. SeaDataNet Version 0: The SeaDataNet portal has been set up at http://www.seadatanet.org and it provides a platform for all SeaDataNet services and standards as well as background information about the project and its partners. It includes discovery services via the following catalogues: CSR - Cruise Summary Reports of research vessels; EDIOS - Locations and details of monitoring stations and networks / programmes; EDMED - High level inventory of Marine Environmental Data sets collected and managed by research institutes and organisations; EDMERP - Marine Environmental Research Projects ; EDMO - Marine Organisations. These catalogues are interrelated, where possible, to facilitate cross searching and context searching. These catalogues connect to the Common Data Index (CDI). Common Data Index (CDI) The CDI gives detailed insight in available datasets at partners databases and paves the way to direct online data access or direct online requests for data access / data delivery. The CDI V0 metadatabase contains more than 340.000 individual data entries from 36 CDI partners from 29 countries across Europe, covering a broad scope and range of data, held by these organisations. For purposes of standardisation and international exchange the ISO19115 metadata standard has been adopted. The CDI format is defined as a dedicated subset of this standard. A CDI XML format supports the exchange between CDI-partners and the central CDI manager, and ensures interoperability with other systems and networks. CDI XML entries are generated by participating data centres, directly from their databases. CDI-partners can make use of dedicated SeaDataNet Tools to generate CDI XML files automatically. Approach for SeaDataNet V1 and V2: The approach for SeaDataNet V1 and V2, which is in line with the INSPIRE Directive, comprises the following services: Discovery services = Metadata directories Security services = Authentication, Authorization & Accounting (AAA) Delivery services = Data access & downloading of datasets Viewing services = Visualisation of metadata, data and data products Product services = Generic and standard products Monitoring services = Statistics on usage and performance of the system Maintenance services = Updating of metadata by SeaDataNet partners The services will be operated over a distributed network of interconnected Data Centres accessed through a central Portal. In addition to service access the portal will provide information on data management standards, tools and protocols. The architecture has been designed to provide a coherent system based on V1 services, whilst leaving the pathway open for later extension with V2 services. For the implementation, a range of technical components have been defined. Some are already operational with the remainder in the final stages of development and testing. These make use of recent web technologies, and also comprise Java components, to provide multi-platform support and syntactic interoperability. To facilitate sharing of resources and interoperability, SeaDataNet has adopted SOAP Web Service technology. The SeaDataNet architecture and components have been designed to handle all kinds of oceanographic and marine environmental data including both in-situ measurements and remote sensing observations. The V1 technical development is ready and the V1 system is now being implemented and adopted by all participating data centres in SeaDataNet. Interoperability: Interoperability is the key to distributed data management system success and it is achieved in SeaDataNet V1 by: Using common quality control protocols and flag scale Using controlled vocabularies from a single source that have been developed using international content governance Adopting the ISO 19115 metadata standard for all metadata directories Providing XML Validation Services to quality control the metadata maintenance, including field content verification based on Schematron. Providing standard metadata entry tools Using harmonised Data Transport Formats (NetCDF, ODV ASCII and MedAtlas ASCII) for data sets delivery Adopting of OGC standards for mapping and viewing services Using SOAP Web Services in the SeaDataNet architecture SeaDataNet V1 Delivery Services: An important objective of the V1 system is to provide transparent access to the distributed data sets via a unique user interface at the SeaDataNet portal and download service. In the SeaDataNet V1 architecture the Common Data Index (CDI) V1 provides the link between discovery and delivery. The CDI user interface enables users to have a detailed insight of the availability and geographical distribution of marine data, archived at the connected data centres, and it provides the means for downloading data sets in common formats via a transaction mechanism. The SeaDataNet portal provides registered users access to these distributed data sets via the CDI V1 Directory and a shopping basket mechanism. This allows registered users to locate data of interest and submit their data requests. The requests are forwarded automatically from the portal to the relevant SeaDataNet data centres. This process is controlled via the Request Status Manager (RSM) Web Service at the portal and a Download Manager (DM) java software module, implemented at each of the data centres. The RSM also enables registered users to check regularly the status of their requests and download data sets, after access has been granted. Data centres can follow all transactions for their data sets online and can handle requests which require their consent. The actual delivery of data sets is done between the user and the selected data centre. The CDI V1 system is now being populated by all participating data centres in SeaDataNet, thereby phasing out CDI V0. 0.1 SeaDataNet Partners: IFREMER (France), MARIS (Netherlands), HCMR/HNODC (Greece), ULg (Belgium), OGS (Italy), NERC/BODC (UK), BSH/DOD (Germany), SMHI (Sweden), IEO (Spain), RIHMI/WDC (Russia), IOC (International), ENEA (Italy), INGV (Italy), METU (Turkey), CLS (France), AWI (Germany), IMR (Norway), NERI (Denmark), ICES (International), EC-DG JRC (International), MI (Ireland), IHPT (Portugal), RIKZ (Netherlands), RBINS/MUMM (Belgium), VLIZ (Belgium), MRI (Iceland), FIMR (Finland ), IMGW (Poland), MSI (Estonia), IAE/UL (Latvia), CMR (Lithuania), SIO/RAS (Russia), MHI/DMIST (Ukraine), IO/BAS (Bulgaria), NIMRD (Romania), TSU (Georgia), INRH (Morocco), IOF (Croatia), PUT (Albania), NIB (Slovenia), UoM (Malta), OC/UCY (Cyprus), IOLR (Israel), NCSR/NCMS (Lebanon), CNR-ISAC (Italy), ISMAL (Algeria), INSTM (Tunisia)
A standard for measuring metadata quality in spectral libraries
NASA Astrophysics Data System (ADS)
Rasaiah, B.; Jones, S. D.; Bellman, C.
2013-12-01
A standard for measuring metadata quality in spectral libraries Barbara Rasaiah, Simon Jones, Chris Bellman RMIT University Melbourne, Australia barbara.rasaiah@rmit.edu.au, simon.jones@rmit.edu.au, chris.bellman@rmit.edu.au ABSTRACT There is an urgent need within the international remote sensing community to establish a metadata standard for field spectroscopy that ensures high quality, interoperable metadata sets that can be archived and shared efficiently within Earth observation data sharing systems. Metadata are an important component in the cataloguing and analysis of in situ spectroscopy datasets because of their central role in identifying and quantifying the quality and reliability of spectral data and the products derived from them. This paper presents approaches to measuring metadata completeness and quality in spectral libraries to determine reliability, interoperability, and re-useability of a dataset. Explored are quality parameters that meet the unique requirements of in situ spectroscopy datasets, across many campaigns. Examined are the challenges presented by ensuring that data creators, owners, and data users ensure a high level of data integrity throughout the lifecycle of a dataset. Issues such as field measurement methods, instrument calibration, and data representativeness are investigated. The proposed metadata standard incorporates expert recommendations that include metadata protocols critical to all campaigns, and those that are restricted to campaigns for specific target measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. Approaches towards an operational and logistically viable implementation of a quality standard are discussed. This paper also proposes a way forward for adapting and enhancing current geospatial metadata standards to the unique requirements of field spectroscopy metadata quality. [0430] BIOGEOSCIENCES / Computational methods and data processing [0480] BIOGEOSCIENCES / Remote sensing [1904] INFORMATICS / Community standards [1912] INFORMATICS / Data management, preservation, rescue [1926] INFORMATICS / Geospatial [1930] INFORMATICS / Data and information governance [1946] INFORMATICS / Metadata [1952] INFORMATICS / Modeling [1976] INFORMATICS / Software tools and services [9810] GENERAL OR MISCELLANEOUS / New fields
NASA Astrophysics Data System (ADS)
Yatagai, Akiyo; Ritschel, Bernd; Iyemori, Tomohiko; Koyama, Yukinobu; Hori, Tomoaki; Abe, Shuji; Tanaka, Yoshimasa; Shinbori, Atsuki; UeNo, Satoru; Sato, Yuka; Yagi, Manabu
2013-04-01
The upper atmospheric observational study is the area which an international collaboration is crucially important. The Japanese Inter-university Upper atmosphere Global Observation NETwork project (2009-2014), IUGONET, is an inter-university program by the National Institute of Polar Research (NIPR), Tohoku University, Nagoya University, Kyoto University, and Kyushu University to build a database of metadata for ground-based observations of the upper atmosphere. In order to investigate the mechanism of long-term variations in the upper atmosphere, we need to combine various types of in-situ observations and to accelerate data exchange. The IUGONET institutions have been archiving observed data by radars, magnetometers, photometers, radio telescopes, helioscopes, etc. in various altitude layers from the Earth's surface to the Sun. The IUGONET has been developing systems for searching metadata of these observational data, and the metadata database (MDB) has already been operating since 2011. It adopts DSPACE system for registering metadata, and it uses an extension of the SPASE data model of describing metadata, which is widely used format in the upper atmospheric society including that in USA. The European Union project ESPAS (2011-2015) has the same scientific objects with IUGONET, namely it aims to provide an e-science infrastructure for the retrieval and access to space weather relevant data, information and value added services. It integrates 22 partners in European countries. The ESPAS also plans to adopt SPASE model for defining their metadata, but search system is different. Namely, in spite of the similarity of the data model, basic system ideas and techniques of the system and web portal are different between IUGONET and ESPAS. In order to connect the two systems/databases, we are planning to take an ontological method. The SPASE keyword vocabulary, derived from the SPASE data model shall be used as standard for the description of near-earth and space data content and context. The SPASE keyword vocabulary is modeled as Simple Knowledge Organizing System (SKOS) ontology. The SPASE keyword vocabulary also can be reused in domain-related but also cross-domain projects. The implementation of the vocabulary as ontology enables the direct integration into semantic web based structures and applications, such as linked data and the new Information System and Data Center (ISDC) data management system.
Sharing Metadata: Enabling Online Information Provision.
ERIC Educational Resources Information Center
Darzentas, Jenny
This paper describes work being carried out in the fields of online education provision and library systems, beginning with a description of the current state of the art with regard to online learning environments and educational materials management. Suggestions and solutions for librarians dealing with the management of educational digital…
Database technology and the management of multimedia data in the Mirror project
NASA Astrophysics Data System (ADS)
de Vries, Arjen P.; Blanken, H. M.
1998-10-01
Multimedia digital libraries require an open distributed architecture instead of a monolithic database system. In the Mirror project, we use the Monet extensible database kernel to manage different representation of multimedia objects. To maintain independence between content, meta-data, and the creation of meta-data, we allow distribution of data and operations using CORBA. This open architecture introduces new problems for data access. From an end user's perspective, the problem is how to search the available representations to fulfill an actual information need; the conceptual gap between human perceptual processes and the meta-data is too large. From a system's perspective, several representations of the data may semantically overlap or be irrelevant. We address these problems with an iterative query process and active user participating through relevance feedback. A retrieval model based on inference networks assists the user with query formulation. The integration of this model into the database design has two advantages. First, the user can query both the logical and the content structure of multimedia objects. Second, the use of different data models in the logical and the physical database design provides data independence and allows algebraic query optimization. We illustrate query processing with a music retrieval application.
Making Interoperability Easier with NASA's Metadata Management Tool (MMT)
NASA Technical Reports Server (NTRS)
Shum, Dana; Reese, Mark; Pilone, Dan; Baynes, Katie
2016-01-01
While the ISO-19115 collection level metadata format meets many users' needs for interoperable metadata, it can be cumbersome to create it correctly. Through the MMT's simple UI experience, metadata curators can create and edit collections which are compliant with ISO-19115 without full knowledge of the NASA Best Practices implementation of ISO-19115 format. Users are guided through the metadata creation process through a forms-based editor, complete with field information, validation hints and picklists. Once a record is completed, users can download the metadata in any of the supported formats with just 2 clicks.
Efficient accesses of data structures using processing near memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jayasena, Nuwan S.; Zhang, Dong Ping; Diez, Paula Aguilera
Systems, apparatuses, and methods for implementing efficient queues and other data structures. A queue may be shared among multiple processors and/or threads without using explicit software atomic instructions to coordinate access to the queue. System software may allocate an atomic queue and corresponding queue metadata in system memory and return, to the requesting thread, a handle referencing the queue metadata. Any number of threads may utilize the handle for accessing the atomic queue. The logic for ensuring the atomicity of accesses to the atomic queue may reside in a management unit in the memory controller coupled to the memory wheremore » the atomic queue is allocated.« less
Operable Data Management for Ocean Observing Systems
NASA Astrophysics Data System (ADS)
Chavez, F. P.; Graybeal, J. B.; Godin, M. A.
2004-12-01
As oceanographic observing systems become more numerous and complex, data management solutions must follow. Most existing oceanographic data management systems fall into one of three categories: they have been developed as dedicated solutions, with limited application to other observing systems; they expect that data will be pre-processed into well-defined formats, such as netCDF; or they are conceived as robust, generic data management solutions, with complexity (high) and maturity and adoption rates (low) to match. Each approach has strengths and weaknesses; no approach yet fully addresses, nor takes advantage of, the sophistication of ocean observing systems as they are now conceived. In this presentation we describe critical data management requirements for advanced ocean observing systems, of the type envisioned by ORION and IOOS. By defining common requirements -- functional, qualitative, and programmatic -- for all such ocean observing systems, the performance and nature of the general data management solution can be characterized. Issues such as scalability, maintaining metadata relationships, data access security, visualization, and operational flexibility suggest baseline architectural characteristics, which may in turn lead to reusable components and approaches. Interoperability with other data management systems, with standards-based solutions in metadata specification and data transport protocols, and with the data management infrastructure envisioned by IOOS and ORION, can also be used to define necessary capabilities. Finally, some requirements for the software infrastructure of ocean observing systems can be inferred. Early operational results and lessons learned, from development and operations of MBARI ocean observing systems, are used to illustrate key requirements, choices, and challenges. Reference systems include the Monterey Ocean Observing System (MOOS), its component software systems (Software Infrastructure and Applications for MOOS, and the Shore Side Data System), and the Autonomous Ocean Sampling Network (AOSN).
Web Services as Building Blocks for an Open Coastal Observing System
NASA Astrophysics Data System (ADS)
Breitbach, G.; Krasemann, H.
2012-04-01
In coastal observing systems it is needed to integrate different observing methods like remote sensing, in-situ measurements, and models into a synoptic view of the state of the observed region. This integration can be based solely on web services combining data and metadata. Such an approach is pursued for COSYNA (Coastal Observing System for Northern and Artic seas). Data from satellite and radar remote sensing, measurements of buoys, stations and Ferryboxes are the observation part of COSYNA. These data are assimilated into models to create pre-operational forecasts. For discovering data an OGC Web Feature Service (WFS) is used by the COSYNA data portal. This Web Feature Service knows the necessary metadata not only for finding data, but in addition the URLs of web services to view and download the data. To make the data from different resources comparable a common vocabulary is needed. For COSYNA the standard names from CF-conventions are stored within the metadata whenever possible. For the metadata an INSPIRE and ISO19115 compatible data format is used. The WFS is fed from the metadata-system using database-views. Actual data are stored in two different formats, in NetCDF-files for gridded data and in an RDBMS for time-series-like data. The web service URLs are mostly standard based the standards are mainly OGC standards. Maps were created from netcdf files with the help of the ncWMS tool whereas a self-developed java servlet is used for maps of moving measurement platforms. In this case download of data is offered via OGC SOS. For NetCDF-files OPeNDAP is used for the data download. The OGC CSW is used for accessing extended metadata. The concept of data management in COSYNA will be presented which is independent of the special services used in COSYNA. This concept is parameter and data centric and might be useful for other observing systems.
EMERALD: A Flexible Framework for Managing Seismic Data
NASA Astrophysics Data System (ADS)
West, J. D.; Fouch, M. J.; Arrowsmith, R.
2010-12-01
The seismological community is challenged by the vast quantity of new broadband seismic data provided by large-scale seismic arrays such as EarthScope’s USArray. While this bonanza of new data enables transformative scientific studies of the Earth’s interior, it also illuminates limitations in the methods used to prepare and preprocess those data. At a recent seismic data processing focus group workshop, many participants expressed the need for better systems to minimize the time and tedium spent on data preparation in order to increase the efficiency of scientific research. Another challenge related to data from all large-scale transportable seismic experiments is that there currently exists no system for discovering and tracking changes in station metadata. This critical information, such as station location, sensor orientation, instrument response, and clock timing data, may change over the life of an experiment and/or be subject to post-experiment correction. Yet nearly all researchers utilize metadata acquired with the downloaded data, even though subsequent metadata updates might alter or invalidate results produced with older metadata. A third long-standing issue for the seismic community is the lack of easily exchangeable seismic processing codes. This problem stems directly from the storage of seismic data as individual time series files, and the history of each researcher developing his or her preferred data file naming convention and directory organization. Because most processing codes rely on the underlying data organization structure, such codes are not easily exchanged between investigators. To address these issues, we are developing EMERALD (Explore, Manage, Edit, Reduce, & Analyze Large Datasets). The goal of the EMERALD project is to provide seismic researchers with a unified, user-friendly, extensible system for managing seismic event data, thereby increasing the efficiency of scientific enquiry. EMERALD stores seismic data and metadata in a state-of-the-art open source relational database (PostgreSQL), and can, on a timed basis or on demand, download the most recent metadata, compare it with previously acquired values, and alert the user to changes. The backend relational database is capable of easily storing and managing many millions of records. The extensible, plug-in architecture of the EMERALD system allows any researcher to contribute new visualization and processing methods written in any of 12 programming languages, and a central Internet-enabled repository for such methods provides users with the opportunity to download, use, and modify new processing methods on demand. EMERALD includes data acquisition tools allowing direct importation of seismic data, and also imports data from a number of existing seismic file formats. Pre-processed clean sets of data can be exported as standard sac files with user-defined file naming and directory organization, for use with existing processing codes. The EMERALD system incorporates existing acquisition and processing tools, including SOD, TauP, GMT, and FISSURES/DHI, making much of the functionality of those tools available in a unified system with a user-friendly web browser interface. EMERALD is now in beta test. See emerald.asu.edu or contact john.d.west@asu.edu for more details.
The center for expanded data annotation and retrieval
Bean, Carol A; Cheung, Kei-Hoi; Dumontier, Michel; Durante, Kim A; Gevaert, Olivier; Gonzalez-Beltran, Alejandra; Khatri, Purvesh; Kleinstein, Steven H; O’Connor, Martin J; Pouliot, Yannick; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Wiser, Jeffrey A
2015-01-01
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments. PMID:26112029
Academic Research Library as Broker in Addressing Interoperability Challenges for the Geosciences
NASA Astrophysics Data System (ADS)
Smith, P., II
2015-12-01
Data capture is an important process in the research lifecycle. Complete descriptive and representative information of the data or database is necessary during data collection whether in the field or in the research lab. The National Science Foundation's (NSF) Public Access Plan (2015) mandates the need for federally funded projects to make their research data more openly available. Developing, implementing, and integrating metadata workflows into to the research process of the data lifecycle facilitates improved data access while also addressing interoperability challenges for the geosciences such as data description and representation. Lack of metadata or data curation can contribute to (1) semantic, (2) ontology, and (3) data integration issues within and across disciplinary domains and projects. Some researchers of EarthCube funded projects have identified these issues as gaps. These gaps can contribute to interoperability data access, discovery, and integration issues between domain-specific and general data repositories. Academic Research Libraries have expertise in providing long-term discovery and access through the use of metadata standards and provision of access to research data, datasets, and publications via institutional repositories. Metadata crosswalks, open archival information systems (OAIS), trusted-repositories, data seal of approval, persistent URL, linking data, objects, resources, and publications in institutional repositories and digital content management systems are common components in the library discipline. These components contribute to a library perspective on data access and discovery that can benefit the geosciences. The USGS Community for Data Integration (CDI) has developed the Science Support Framework (SSF) for data management and integration within its community of practice for contribution to improved understanding of the Earth's physical and biological systems. The USGS CDI SSF can be used as a reference model to map to EarthCube Funded projects with academic research libraries facilitating the data and information assets components of the USGS CDI SSF via institutional repositories and/or digital content management. This session will explore the USGS CDI SSF for cross-discipline collaboration considerations from a library perspective.
NASA Technical Reports Server (NTRS)
Hughes, J.
1998-01-01
The Planetary Data System (PDS) is an active science data archive managed by scientists for NASA's planetary science community. With the advent of the World Wide Web the majority of the archive has been placed on-line as a science digital libraty for access by scientists, the educational community, and the general public.
NASA Astrophysics Data System (ADS)
Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume
2015-04-01
RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.
The Metadata Cloud: The Last Piece of a Distributed Data System Model
NASA Astrophysics Data System (ADS)
King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.
2012-12-01
Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Data System for HS3 Airborne Field Campaign
NASA Astrophysics Data System (ADS)
Maskey, M.; Mceniry, M.; Berendes, T.; Bugbee, K.; Conover, H.; Ramachandran, R.
2014-12-01
Hurricane and Severe Storm Sentinel (HS3) is a NASA airborne field campaign aimed at better understanding the physical processes that control hurricane intensity change. HS3 will help answer questions related to the roles of environmental conditions and internal storm structures to storm intensification. Due to the nature of the questions that HS3 mission is addressing, it involves a variety of in-situ, satellite observations, airborne data, meteorological analyses, and simulation data. This variety of datasets presents numerous data management challenges for HS3. The methods used for airborne data management differ greatly from the methods used for space-borne data. In particular, metadata extraction, spatial and temporal indexing, and the large number of instruments and subsequent variables are a few of the data management challenges unique to airborne missions. A robust data system is required to successfully help HS3 scientist achieve their mission goals. Furthermore, the data system also needs to provide for data management that assists in broader use of HS3 data to enable future research activities. The Global Hydrology Resource Center (GHRC) is considering all these needs and designing a data system for HS3. Experience with past airborne field campaign puts GHRC in a good position to address HS3 needs. However, the scale of this mission along with science requirements separates HS3 from previous field campaigns. The HS3 data system will include automated services for geo-location, metadata extraction, discovery, and distribution for all HS3 data. To answer the science questions, the data system will include a visual data exploration tool that is fully integrated into the data catalog. The tool will allow visually augmenting airborne data with analyses and simulations. Satellite data will provide contextual information during such data explorations. All HS3 tools will be supported by an enterprise service architecture that will allow scaling, easy integration of new tools and existing services, and integration of new ESDIS metadata and security guidelines.
Advancements in Large-Scale Data/Metadata Management for Scientific Data.
NASA Astrophysics Data System (ADS)
Guntupally, K.; Devarakonda, R.; Palanisamy, G.; Frame, M. T.
2017-12-01
Scientific data often comes with complex and diverse metadata which are critical for data discovery and users. The Online Metadata Editor (OME) tool, which was developed by an Oak Ridge National Laboratory team, effectively manages diverse scientific datasets across several federal data centers, such as DOE's Atmospheric Radiation Measurement (ARM) Data Center and USGS's Core Science Analytics, Synthesis, and Libraries (CSAS&L) project. This presentation will focus mainly on recent developments and future strategies for refining OME tool within these centers. The ARM OME is a standard based tool (https://www.archive.arm.gov/armome) that allows scientists to create and maintain metadata about their data products. The tool has been improved with new workflows that help metadata coordinators and submitting investigators to submit and review their data more efficiently. The ARM Data Center's newly upgraded Data Discovery Tool (http://www.archive.arm.gov/discovery) uses rich metadata generated by the OME to enable search and discovery of thousands of datasets, while also providing a citation generator and modern order-delivery techniques like Globus (using GridFTP), Dropbox and THREDDS. The Data Discovery Tool also supports incremental indexing, which allows users to find new data as and when they are added. The USGS CSAS&L search catalog employs a custom version of the OME (https://www1.usgs.gov/csas/ome), which has been upgraded with high-level Federal Geographic Data Committee (FGDC) validations and the ability to reserve and mint Digital Object Identifiers (DOIs). The USGS's Science Data Catalog (SDC) (https://data.usgs.gov/datacatalog) allows users to discover a myriad of science data holdings through a web portal. Recent major upgrades to the SDC and ARM Data Discovery Tool include improved harvesting performance and migration using new search software, such as Apache Solr 6.0 for serving up data/metadata to scientific communities. Our presentation will highlight the future enhancements of these tools which enable users to retrieve fast search results, along with parallelizing the retrieval process from online and High Performance Storage Systems. In addition, these improvements to the tools will support additional metadata formats like the Large-Eddy Simulation (LES) ARM Symbiotic and Observation (LASSO) bundle data.
IGSN e.V.: Registration and Identification Services for Physical Samples in the Digital Universe
NASA Astrophysics Data System (ADS)
Lehnert, K. A.; Klump, J.; Arko, R. A.; Bristol, S.; Buczkowski, B.; Chan, C.; Chan, S.; Conze, R.; Cox, S. J.; Habermann, T.; Hangsterfer, A.; Hsu, L.; Milan, A.; Miller, S. P.; Noren, A. J.; Richard, S. M.; Valentine, D. W.; Whitenack, T.; Wyborn, L. A.; Zaslavsky, I.
2011-12-01
The International Geo Sample Number (IGSN) is a unique identifier for samples and specimens collected from our natural environment. It was developed by the System for Earth Sample Registration SESAR to overcome the problem of ambiguous naming of samples that has limited the ability to share, link, and integrate data for samples across Geoscience data systems. Over the past 5 years, SESAR has made substantial progress in implementing the IGSN for sample and data management, working with Geoscience researchers, Geoinformatics specialists, and sample curators to establish metadata requirements, registration procedures, and best practices for the use of the IGSN. The IGSN is now recognized as the primary solution for sample identification and registration, and supported by a growing user community of investigators, repositories, science programs, and data systems. In order to advance broad disciplinary and international implementation of the IGSN, SESAR organized a meeting of international leaders in Geoscience informatics in 2011 to develop a consensus strategy for the long-term operations of the registry with approaches for sustainable operation, organizational structure, governance, and funding. The group endorsed an internationally unified approach for registration and discovery of physical specimens in the Geosciences, and refined the existing SESAR architecture to become a modular and scalable approach, separating the IGSN Registry from a central Sample Metadata Clearinghouse (SESAR), and introducing 'Local Registration Agents' that provide registration services to specific disciplinary or organizational communities, with tools for metadata submission and management, and metadata archiving. Development and implementation of the new IGSN architecture is underway with funding provided by the US NSF Office of International Science and Engineering. A formal governance structure is being established for the IGSN model, consisting of (a) an international not-for-profit organization, the IGSN e.V. (e.V. = 'Eingetragener Verein', legal status for a registered voluntary association in Germany), that defines the IGSN scope and syntax and maintains the IGSN Handle system, and (b) a Science Advisory Board that guides policies, technology, and best practices of the SESAR Sample Metadata Clearinghouse and Local Registration Agents. The IGSN e.V. is being incorporated in Germany at the GFZ Potsdam, a founding event is planned for the AGU Fall Meeting.
Harvesting NASA's Common Metadata Repository (CMR)
NASA Technical Reports Server (NTRS)
Shum, Dana; Durbin, Chris; Norton, James; Mitchell, Andrew
2017-01-01
As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
Harvesting NASA's Common Metadata Repository
NASA Astrophysics Data System (ADS)
Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.
2017-12-01
As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
The design and implementation of the HY-1B Product Archive System
NASA Astrophysics Data System (ADS)
Liu, Shibin; Liu, Wei; Peng, Hailong
2010-11-01
Product Archive System (PAS), as a background system, is the core part of the Product Archive and Distribution System (PADS) which is the center for data management of the Ground Application System of HY-1B satellite hosted by the National Satellite Ocean Application Service of China. PAS integrates a series of updating methods and technologies, such as a suitable data transmittal mode, flexible configuration files and log information in order to make the system with several desirable characteristics, such as ease of maintenance, stability, minimal complexity. This paper describes seven major components of the PAS (Network Communicator module, File Collector module, File Copy module, Task Collector module, Metadata Extractor module, Product data Archive module, Metadata catalogue import module) and some of the unique features of the system, as well as the technical problems encountered and resolved.
NDSI products system based on Hadoop platform
NASA Astrophysics Data System (ADS)
Zhou, Yan; Jiang, He; Yang, Xiaoxia; Geng, Erhui
2015-12-01
Snow is solid state of water resources on earth, and plays an important role in human life. Satellite remote sensing is significant in snow extraction with the advantages of cyclical, macro, comprehensiveness, objectivity, timeliness. With the continuous development of remote sensing technology, remote sensing data access to the trend of multiple platforms, multiple sensors and multiple perspectives. At the same time, in view of the remote sensing data of compute-intensive applications demand increase gradually. However, current the producing system of remote sensing products is in a serial mode, and this kind of production system is used for professional remote sensing researchers mostly, and production systems achieving automatic or semi-automatic production are relatively less. Facing massive remote sensing data, the traditional serial mode producing system with its low efficiency has been difficult to meet the requirements of mass data timely and efficient processing. In order to effectively improve the production efficiency of NDSI products, meet the demand of large-scale remote sensing data processed timely and efficiently, this paper build NDSI products production system based on Hadoop platform, and the system mainly includes the remote sensing image management module, NDSI production module, and system service module. Main research contents and results including: (1)The remote sensing image management module: includes image import and image metadata management two parts. Import mass basis IRS images and NDSI product images (the system performing the production task output) into HDFS file system; At the same time, read the corresponding orbit ranks number, maximum/minimum longitude and latitude, product date, HDFS storage path, Hadoop task ID (NDSI products), and other metadata information, and then create thumbnails, and unique ID number for each record distribution, import it into base/product image metadata database. (2)NDSI production module: includes the index calculation, production tasks submission and monitoring two parts. Read HDF images related to production task in the form of a byte stream, and use Beam library to parse image byte stream to the form of Product; Use MapReduce distributed framework to perform production tasks, at the same time monitoring task status; When the production task complete, calls remote sensing image management module to store NDSI products. (3)System service module: includes both image search and DNSI products download. To image metadata attributes described in JSON format, return to the image sequence ID existing in the HDFS file system; For the given MapReduce task ID, package several task output NDSI products into ZIP format file, and return to the download link (4)System evaluation: download massive remote sensing data and use the system to process it to get the NDSI products testing the performance, and the result shows that the system has high extendibility, strong fault tolerance, fast production speed, and the image processing results with high accuracy.
Towards Data Value-Level Metadata for Clinical Studies.
Zozus, Meredith Nahm; Bonner, Joseph
2017-01-01
While several standards for metadata describing clinical studies exist, comprehensive metadata to support traceability of data from clinical studies has not been articulated. We examine uses of metadata in clinical studies. We examine and enumerate seven sources of data value-level metadata in clinical studies inclusive of research designs across the spectrum of the National Institutes of Health definition of clinical research. The sources of metadata inform categorization in terms of metadata describing the origin of a data value, the definition of a data value, and operations to which the data value was subjected. The latter is further categorized into information about changes to a data value, movement of a data value, retrieval of a data value, and data quality checks, constraints or assessments to which the data value was subjected. The implications of tracking and managing data value-level metadata are explored.
NASA Technical Reports Server (NTRS)
Golden, Keith; Clancy, Dan (Technical Monitor)
2001-01-01
The data management problem comprises data processing and data tracking. Data processing is the creation of new data based on existing data sources. Data tracking consists of storing metadata descriptions of available data. This paper addresses the data management problem by casting it as an AI planning problem. Actions are data-processing commands, plans are dataflow programs and goals are metadata descriptions of desired data products. Data manipulation is simply plan generation and execution, and a key component of data tracking is inferring the effects of an observed plan. We introduce a new action language for data management domains, called ADILM. We discuss the connection between data processing and information integration and show how a language for the latter must be modified to support the former. The paper also discusses information gathering within a data-processing framework, and show how ADILM metadata expressions are a generalization of Local Completeness.
Managing Heterogeneous Information Systems through Discovery and Retrieval of Generic Concepts.
ERIC Educational Resources Information Center
Srinivasan, Uma; Ngu, Anne H. H.; Gedeon, Tom
2000-01-01
Introduces a conceptual integration approach to heterogeneous databases or information systems that exploits the similarity in metalevel information and performs metadata mining on database objects to discover a set of concepts that serve as a domain abstraction and provide a conceptual layer above existing legacy systems. Presents results of…
Reinforcement learning interfaces for biomedical database systems.
Rudowsky, I; Kulyba, O; Kunin, M; Parsons, S; Raphan, T
2006-01-01
Studies of neural function that are carried out in different laboratories and that address different questions use a wide range of descriptors for data storage, depending on the laboratory and the individuals that input the data. A common approach to describe non-textual data that are referenced through a relational database is to use metadata descriptors. We have recently designed such a prototype system, but to maintain efficiency and a manageable metadata table, free formatted fields were designed as table entries. The database interface application utilizes an intelligent agent to improve integrity of operation. The purpose of this study was to investigate how reinforcement learning algorithms can assist the user in interacting with the database interface application that has been developed to improve the performance of the system.
Metadata Exporter for Scientific Photography Management
NASA Astrophysics Data System (ADS)
Staudigel, D.; English, B.; Delaney, R.; Staudigel, H.; Koppers, A.; Hart, S.
2005-12-01
Photographs have become an increasingly important medium, especially with the advent of digital cameras. It has become inexpensive to take photographs and quickly post them on a website. However informative photos may be, they still need to be displayed in a convenient way, and be cataloged in such a manner that makes them easily locatable. Managing the great number of photographs that digital cameras allow and creating a format for efficient dissemination of the information related to the photos is a tedious task. Products such as Apple's iPhoto have greatly eased the task of managing photographs, However, they often have limitations. Un-customizable metadata fields and poor metadata extraction tools limit their scientific usefulness. A solution to this persistent problem is a customizable metadata exporter. On the ALIA expedition, we successfully managed the thousands of digital photos we took. We did this with iPhoto and a version of the exporter that is now available to the public under the name "CustomHTMLExport" (http://www.versiontracker.com/dyn/moreinfo/macosx/27777), currently undergoing formal beta testing This software allows the use of customized metadata fields (including description, time, date, GPS data, etc.), which is exported along with the photo. It can also produce webpages with this data straight from iPhoto, in a much more flexible way than is already allowed. With this tool it becomes very easy to manage and distribute scientific photos.
Automating Data Submission to a National Archive
NASA Astrophysics Data System (ADS)
Work, T. T.; Chandler, C. L.; Groman, R. C.; Allison, M. D.; Gegg, S. R.; Biological; Chemical Oceanography Data Management Office
2010-12-01
In late 2006, the U.S. National Science Foundation (NSF) funded the Biological and Chemical Oceanographic Data Management Office (BCO-DMO) at Woods Hole Oceanographic Institution (WHOI) to work closely with investigators to manage oceanographic data generated from their research projects. One of the final data management tasks is to ensure that the data are permanently archived at the U.S. National Oceanographic Data Center (NODC) or other appropriate national archiving facility. In the past, BCO-DMO submitted data to NODC as an email with attachments including a PDF file (a manually completed metadata record) and one or more data files. This method is no longer feasible given the rate at which data sets are contributed to BCO-DMO. Working with collaborators at NODC, a more streamlined and automated workflow was developed to keep up with the increased volume of data that must be archived at NODC. We will describe our new workflow; a semi-automated approach for contributing data to NODC that includes a Federal Geographic Data Committee (FGDC) compliant Extensible Markup Language (XML) metadata file accompanied by comma-delimited data files. The FGDC XML file is populated from information stored in a MySQL database. A crosswalk described by an Extensible Stylesheet Language Transformation (XSLT) is used to transform the XML formatted MySQL result set to a FGDC compliant XML metadata file. To ensure data integrity, the MD5 algorithm is used to generate a checksum and manifest of the files submitted to NODC for permanent archive. The revised system supports preparation of detailed, standards-compliant metadata that facilitate data sharing and enable accurate reuse of multidisciplinary information. The approach is generic enough to be adapted for use by other data management groups.
Framework for Integrating Science Data Processing Algorithms Into Process Control Systems
NASA Technical Reports Server (NTRS)
Mattmann, Chris A.; Crichton, Daniel J.; Chang, Albert Y.; Foster, Brian M.; Freeborn, Dana J.; Woollard, David M.; Ramirez, Paul M.
2011-01-01
A software framework called PCS Task Wrapper is responsible for standardizing the setup, process initiation, execution, and file management tasks surrounding the execution of science data algorithms, which are referred to by NASA as Product Generation Executives (PGEs). PGEs codify a scientific algorithm, some step in the overall scientific process involved in a mission science workflow. The PCS Task Wrapper provides a stable operating environment to the underlying PGE during its execution lifecycle. If the PGE requires a file, or metadata regarding the file, the PCS Task Wrapper is responsible for delivering that information to the PGE in a manner that meets its requirements. If the PGE requires knowledge of upstream or downstream PGEs in a sequence of executions, that information is also made available. Finally, if information regarding disk space, or node information such as CPU availability, etc., is required, the PCS Task Wrapper provides this information to the underlying PGE. After this information is collected, the PGE is executed, and its output Product file and Metadata generation is managed via the PCS Task Wrapper framework. The innovation is responsible for marshalling output Products and Metadata back to a PCS File Management component for use in downstream data processing and pedigree. In support of this, the PCS Task Wrapper leverages the PCS Crawler Framework to ingest (during pipeline processing) the output Product files and Metadata produced by the PGE. The architectural components of the PCS Task Wrapper framework include PGE Task Instance, PGE Config File Builder, Config File Property Adder, Science PGE Config File Writer, and PCS Met file Writer. This innovative framework is really the unifying bridge between the execution of a step in the overall processing pipeline, and the available PCS component services as well as the information that they collectively manage.
NASA's Earth Observing Data and Information System - Near-Term Challenges
NASA Technical Reports Server (NTRS)
Behnke, Jeanne; Mitchell, Andrew; Ramapriyan, Hampapuram
2018-01-01
NASA's Earth Observing System Data and Information System (EOSDIS) has been a central component of the NASA Earth observation program since the 1990's. EOSDIS manages data covering a wide range of Earth science disciplines including cryosphere, land cover change, polar processes, field campaigns, ocean surface, digital elevation, atmosphere dynamics and composition, and inter-disciplinary research, and many others. One of the key components of EOSDIS is a set of twelve discipline-based Distributed Active Archive Centers (DAACs) distributed across the United States. Managed by NASA's Earth Science Data and Information System (ESDIS) Project at Goddard Space Flight Center, these DAACs serve over 3 million users globally. The ESDIS Project provides the infrastructure support for EOSDIS, which includes other components such as the Science Investigator-led Processing systems (SIPS), common metadata and metrics management systems, specialized network systems, standards management, and centralized support for use of commercial cloud capabilities. Given the long-term requirements, and the rapid pace of information technology and changing expectations of the user community, EOSDIS has evolved continually over the past three decades. However, many challenges remain. Challenges addressed in this paper include: growing volume and variety, achieving consistency across a diverse set of data producers, managing information about a large number of datasets, migration to a cloud computing environment, optimizing data discovery and access, incorporating user feedback from a diverse community, keeping metadata updated as data collections grow and age, and ensuring that all the content needed for understanding datasets by future users is identified and preserved.
NASA Astrophysics Data System (ADS)
Efstathiou, Nectarios; Skitsas, Michael; Psaroudakis, Chrysostomos; Koutras, Nikolaos
2017-09-01
Nowadays, video surveillance cameras are used for the protection and monitoring of a huge number of facilities worldwide. An important element in such surveillance systems is the use of aerial video streams originating from onboard sensors located on Unmanned Aerial Vehicles (UAVs). Video surveillance using UAVs represent a vast amount of video to be transmitted, stored, analyzed and visualized in a real-time way. As a result, the introduction and development of systems able to handle huge amount of data become a necessity. In this paper, a new approach for the collection, transmission and storage of aerial videos and metadata is introduced. The objective of this work is twofold. First, the integration of the appropriate equipment in order to capture and transmit real-time video including metadata (i.e. position coordinates, target) from the UAV to the ground and, second, the utilization of the ADITESS Versatile Media Content Management System (VMCMS-GE) for storing of the video stream and the appropriate metadata. Beyond the storage, VMCMS-GE provides other efficient management capabilities such as searching and processing of videos, along with video transcoding. For the evaluation and demonstration of the proposed framework we execute a use case where the surveillance of critical infrastructure and the detection of suspicious activities is performed. Collected video Transcodingis subject of this evaluation as well.
Metadata for Web Resources: How Metadata Works on the Web.
ERIC Educational Resources Information Center
Dillon, Martin
This paper discusses bibliographic control of knowledge resources on the World Wide Web. The first section sets the context of the inquiry. The second section covers the following topics related to metadata: (1) definitions of metadata, including metadata as tags and as descriptors; (2) metadata on the Web, including general metadata systems,…
Manifestations of Metadata: From Alexandria to the Web--Old is New Again
ERIC Educational Resources Information Center
Kennedy, Patricia
2008-01-01
This paper is a discussion of the use of metadata, in its various manifestations, to access information. Information management standards are discussed. The connection between the ancient world and the modern world is highlighted. Individual perspectives are paramount in fulfilling information seeking. Metadata is interpreted and reflected upon in…
Enriched Video Semantic Metadata: Authorization, Integration, and Presentation.
ERIC Educational Resources Information Center
Mu, Xiangming; Marchionini, Gary
2003-01-01
Presents an enriched video metadata framework including video authorization using the Video Annotation and Summarization Tool (VAST)-a video metadata authorization system that integrates both semantic and visual metadata-- metadata integration, and user level applications. Results demonstrated that the enriched metadata were seamlessly…
Sea Level Station Metadata for Tsunami Detection, Warning and Research
NASA Astrophysics Data System (ADS)
Stroker, K. J.; Marra, J.; Kari, U. S.; Weinstein, S. A.; Kong, L.
2007-12-01
The devastating earthquake and tsunami of December 26, 2004 has greatly increased recognition of the need for water level data both from the coasts and the deep-ocean. In 2006, the National Oceanic and Atmospheric Administration (NOAA) completed a Tsunami Data Management Report describing the management of data required to minimize the impact of tsunamis in the United States. One of the major gaps defined in this report is the access to global coastal water level data. NOAA's National Geophysical Data Center (NGDC) and National Climatic Data Center (NCDC) are working cooperatively to bridge this gap. NOAA relies on a network of global data, acquired and processed in real-time to support tsunami detection and warning, as well as high-quality global databases of archived data to support research and advanced scientific modeling. In 2005, parties interested in enhancing the access and use of sea level station data united under the NOAA NCDC's Integrated Data and Environmental Applications (IDEA) Center's Pacific Region Integrated Data Enterprise (PRIDE) program to develop a distributed metadata system describing sea level stations (Kari et. al., 2006; Marra et.al., in press). This effort started with pilot activities in a regional framework and is targeted at tsunami detection and warning systems being developed by various agencies. It includes development of the components of a prototype sea level station metadata web service and accompanying Google Earth-based client application, which use an XML-based schema to expose, at a minimum, information in the NOAA National Weather Service (NWS) Pacific Tsunami Warning Center (PTWC) station database needed to use the PTWC's Tide Tool application. As identified in the Tsunami Data Management Report, the need also exists for long-term retention of the sea level station data. NOAA envisions that the retrospective water level data and metadata will also be available through web services, using an XML-based schema. Five high-priority metadata requirements identified at a water level workshop held at the XXIV IUGG Meeting in Perugia will be addressed: consistent, validated, and well defined numbers (e.g. amplitude); exact location of sea level stations; a complete record of sea level data stored in the archive; identifying high-priority sea level stations; and consistent definitions. NOAA's National Geophysical Data Center (NGDC) and co-located World Data Center for Solid Earth Geophysics (including tsunamis) would hold the archive of the sea level station data and distribute the standard metadata. Currently, NGDC is also archiving and distributing the DART buoy deep-ocean water level data and metadata in standards based formats. Kari, Uday S., John J. Marra, Stuart A. Weinstein, 2006 A Tsunami Focused Data Sharing Framework For Integration of Databases that Describe Water Level Station Specifications. AGU Fall Meeting, 2006. San Francisco, California. Marra, John, J., Uday S. Kari, and Stuart A. Weinstein (in press). A Tsunami Detection and Warning-focused Sea Level Station Metadata Web Service. IUGG XXIV, July 2-13, 2007. Perugia, Italy.
A Tsunami-Focused Tide Station Data Sharing Framework
NASA Astrophysics Data System (ADS)
Kari, U. S.; Marra, J. J.; Weinstein, S. A.
2006-12-01
The Indian Ocean Tsunami of 26 December 2004 made it clear that information about tide stations that could be used to support detection and warning (such as location, collection and transmission capabilities, operator identification) are insufficiently known or not readily accessible. Parties interested in addressing this problem united under the Pacific Region Data Integrated Data Enterprise (PRIDE), and in 2005 began a multiyear effort to develop a distributed metadata system describing tide stations starting with pilot activities in a regional framework and focusing on tsunami detection and warning systems being developed by various agencies. First, a plain semantic description of the tsunami-focused tide station metadata was developed. The semantic metadata description was, in turn, developed into a formal metadata schema championed by International Tsunami Information Centre (ITIC) as part of a larger effort to develop a prototype web service under the PRIDE program in 2005. Under the 2006 PRIDE program the formal metadata schema was then expanded to corral input parameters for the TideTool application used by Pacific Tsunami Warning Center (PTWC) to drill down into wave activity at a tide station that is located using a web service developed on this metadata schema. This effort contributed to formalization of web service dissemination of PTWC watch and warning tsunami bulletins. During this time, the data content and sharing issues embodied in this schema have been discussed at various forums. The result is that the various stakeholders have different data provider and user perspectives (semantic content) and also exchange formats (not limited to just XML). The challenge then, is not only to capture all data requirements, but also to have formal representation that is easily transformed into any specified format. The latest revision of the tide gauge schema (Version 0.3), begins to address this challenge. It encompasses a broader range of provider and user perspectives, such as station operators, warning system managers, disaster managers, other marine hazard warning systems (such as storm surges and sea level change monitoring and research. In the next revision(s), we hope to take into account various relevant standards, including specifically, the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) Framework, that will serve all prospective stakeholders in the most useful (extensible, scalable) manner. This is because Sensor ML has addressed many of the challenges we face already, through very useful fundamental modeling consideration and data types that are particular to sensors in general, with perhaps some extension needed for tide gauges. As a result of developing this schema, and associated client application architectures, we hope to have a much more distributed network of data providers, who are able to contribute to a global tide station metadata from the comfort of their own Information Technology (IT) departments.
Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.
2015-01-01
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402
A metadata-driven approach to data repository design.
Harvey, Matthew J; McLean, Andrew; Rzepa, Henry S
2017-01-01
The design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the demonstration of a method for integration with commercial software that confers rich domain-specific data analytics without introducing customisation into the repository itself.
Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Hook, Leslie A; Killeffer, Terri S
The Online Metadata Editor (OME) is a web-based tool to help document scientific data in a well-structured, popular scientific metadata format. In this paper, we will discuss the newest tool that Oak Ridge National Laboratory (ORNL) has developed to generate, edit, and manage metadata and how it is helping data-intensive science centers and projects, such as the U.S. Department of Energy s Next Generation Ecosystem Experiments (NGEE) in the Arctic to prepare metadata and make their big data produce big science and lead to new discoveries.
NASA Astrophysics Data System (ADS)
Yu, Xu; Shao, Quanqin; Zhu, Yunhai; Deng, Yuejin; Yang, Haijun
2006-10-01
With the development of informationization and the separation between data management departments and application departments, spatial data sharing becomes one of the most important objectives for the spatial information infrastructure construction, and spatial metadata management system, data transmission security and data compression are the key technologies to realize spatial data sharing. This paper discusses the key technologies for metadata based on data interoperability, deeply researches the data compression algorithms such as adaptive Huffman algorithm, LZ77 and LZ78 algorithm, studies to apply digital signature technique to encrypt spatial data, which can not only identify the transmitter of spatial data, but also find timely whether the spatial data are sophisticated during the course of network transmission, and based on the analysis of symmetric encryption algorithms including 3DES,AES and asymmetric encryption algorithm - RAS, combining with HASH algorithm, presents a improved mix encryption method for spatial data. Digital signature technology and digital watermarking technology are also discussed. Then, a new solution of spatial data network distribution is put forward, which adopts three-layer architecture. Based on the framework, we give a spatial data network distribution system, which is efficient and safe, and also prove the feasibility and validity of the proposed solution.
NASA Astrophysics Data System (ADS)
Moore, R.; Faerman, M.; Minster, J.; Day, S. M.; Ely, G.
2003-12-01
A community digital library provides support for ingestion, organization, description, preservation, and access of digital entities. The technologies that traditionally provide these capabilities are digital libraries (ingestion, organization, description), persistent archives (preservation) and data grids (access). We present a design for the SCEC community digital library that incorporates aspects of all three systems. Multiple groups have created integrated environments that sustain large-scale scientific data collections. By examining these projects, the following stages of implementation can be identified: \\begin{itemize} Definition of semantic terms to associate with relevant information. This includes definition of uniform content descriptors to describe physical quantities relevant to the scientific discipline, and creation of concept spaces to define how the uniform content descriptors are logically related. Organization of digital entities into logical collections that make it simple to browse and manage related material. Definition of services that are used to access and manipulate material in the collection. Creation of a preservation environment for the long-term management of the collection. Each community is faced with heterogeneity that is introduced when data is distributed across multiple sites, or when multiple sets of collection semantics are used, and or when multiple scientific sub-disciplines are federated. We will present the relevant standards that simplify the implementation of the SCEC community library, the resource requirements for different types of data sets that drive the implementation, and the digital library processes that the SCEC community library will support. The SCEC community library can be viewed as the set of processing steps that are required to build the appropriate SCEC reference data sets (SCEC approved encoding format, SCEC approved descriptive metadata, SCEC approved collection organization, and SCEC managed storage location). Each digital entity that is ingested into the SCEC community library is processed and validated for conformance to SCEC standards. These steps generate provenance, descriptive, administrative, structural, and behavioral metadata. Using data grid technology, the descriptive metadata can be registered onto a logical name space that is controlled and managed by the SCEC digital library. A version of the SCEC community digital library is being implemented in the Storage Resource Broker. The SRB system provides almost all the features enumerated above. The peer-to-peer federation of metadata catalogs is planned for release in September, 2003. The SRB system is in production use in multiple projects, from high-energy physics, to astronomy, to earth systems science, to bio-informatics. The SCEC community library will be based on the definition of standard metadata attributes, the creation of logical collections within the SRB, the creation of access services, and the demonstration of a preservation environment. The use of the SRB for the SCEC digital library will sustain the expected collection size and collection capabilities.
Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts
NASA Technical Reports Server (NTRS)
Gilman, Jason; Shum, Dana; Baynes, Katie
2016-01-01
Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.
ERIC Educational Resources Information Center
Armstrong, C. J.
1997-01-01
Discusses PICS (Platform for Internet Content Selection), the Centre for Information Quality Management (CIQM), and metadata. Highlights include filtering networked information; the quality of information; and standardizing search engines. (LRW)
McMahon, Christiana; Denaxas, Spiros
2016-01-01
Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly. PMID:27570670
McMahon, Christiana; Denaxas, Spiros
2016-01-01
Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly.
NASA Astrophysics Data System (ADS)
Kunkel, K.; Champion, S.
2015-12-01
Data Management and the National Climate Assessment: A Data Quality Solution Sarah M. Champion and Kenneth E. Kunkel Cooperative Institute for Climate and Satellites, Asheville, NC The Third National Climate Assessment (NCA), anticipated for its authoritative climate change analysis, was also a vanguard in climate communication. From the cutting-edge website to the organization of information, the Assessment content appealed to, and could be accessed by, many demographics. One such pivotal presentation of information in the NCA was the availability of complex metadata directly connected to graphical products. While the basic metadata requirement is federally mandated through a series of federal guidelines as a part of the Information Quality Act, the NCA is also deemed a Highly Influential Scientific Assessment, which requires demonstration of the transparency and reproducibility of the content. To meet these requirements, the Technical Support Unit (TSU) for the NCA embarked on building a system for collecting and presenting metadata that not only met these requirements, but one that has since been employed in support of additional Assessments. The metadata effort for this NCA proved invaluable for many reasons, one of which being that it showcased that there is a critical need for a culture change within the scientific community to support collection and transparency of data and methods to the level produced with the NCA. Irregardless of being federally mandated, it proves to simply be a good practice in science communication. This presentation will detail the collection system built by the TSU, the improvements employed with additional Assessment products, as well as illustrate examples of successful transparency. Through this presentation, we hope to impel the discussion in support of detailed metadata becoming the cultural norm within the scientific community to support influential and highly policy-relevant documents such as the NCA.
Evolutions in Metadata Quality
NASA Astrophysics Data System (ADS)
Gilman, J.
2016-12-01
Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This talk will cover how we encourage metadata authors to improve the metadata through the use of integrated rubrics of metadata quality and outreach efforts. In addition we'll demonstrate Humanizers, a technique for dealing with the symptoms of metadata issues. Humanizers allow CMR administrators to identify specific metadata issues that are fixed at runtime when the data is indexed. An example Humanizer is the aliasing of processing level "Level 1" to "1" to improve consistency across collections. The CMR currently indexes 35K collections and 300M granules.
Evolving the Living With a Star Data System Definition
NASA Astrophysics Data System (ADS)
Otranto, J. F.; Dijoseph, M.
2003-12-01
NASA's Living With a Star (LWS) Program is a space weather-focused and applications-driven research program. The LWS Program is soliciting input from the solar, space physics, space weather, and climate science communities to develop a system that enables access to science data associated with these disciplines, and advances the development of discipline and interdisciplinary findings. The LWS Program will implement a data system that builds upon the existing and planned data capture, processing, and storage components put in place by individual spacecraft missions and also inter-project data management systems, including active and deep archives, and multi-mission data repositories. It is technically feasible for the LWS Program to integrate data from a broad set of resources, assuming they are either publicly accessible or allow access by permission. The LWS Program data system will work in coordination with spacecraft mission data systems and science data repositories, integrating their holdings using a common metadata representation. This common representation relies on a robust metadata definition that provides journalistic and technical data descriptions, plus linkages to supporting data products and tools. The LWS Program intends to become an enabling resource to PIs, interdisciplinary scientists, researchers, and students facilitating both access to a broad collection of science data, as well as the necessary supporting components to understand and make productive use of these data. For the LWS Program to represent science data that are physically distributed across various ground system elements, information will be collected about these distributed data products through a series of LWS Program-created agents. These agents will be customized to interface or interact with each one of these data systems, collect information, and forward any new metadata records to a LWS Program-developed metadata library. A populated LWS metadata library will function as a single point-of-contact that serves the entire science community as a first stop for data availability, whether or not science data are physically stored in an LWS-operated repository. Further, this metadata library will provide the user access to information for understanding these data including descriptions of the associated spacecraft and instrument, data format, calibration and operations issues, links to ancillary and correlative data products, links to processing tools and models associated with these data, and any corresponding findings produced using these data. The LWS may also support an active archive for solar, space physics, space weather, and climate data when these data would otherwise be discarded or archived off-line. This archive could potentially serve also as a data storage backup facility for LWS missions. The plan for the LWS Program metadata library is developed based upon input received from the solar and geospace science communities; the library's architecture is based on existing systems developed for serving science metadata. The LWS Program continues to seek constructive input from the science community, examples of both successes and failures in dealing with science data systems, and insights regarding the obstacles between the current state-of-the-practice and this vision for the LWS Program metadata library.
Emerging Network Storage Management Standards for Intelligent Data Storage Subsystems
NASA Technical Reports Server (NTRS)
Podio, Fernando; Vollrath, William; Williams, Joel; Kobler, Ben; Crouse, Don
1998-01-01
This paper discusses the need for intelligent storage devices and subsystems that can provide data integrity metadata, the content of the existing data integrity standard for optical disks and techniques and metadata to verify stored data on optical tapes developed by the Association for Information and Image Management (AIIM) Optical Tape Committee.
eScience for molecular-scale simulations and the eMinerals project.
Salje, E K H; Artacho, E; Austen, K F; Bruin, R P; Calleja, M; Chappell, H F; Chiang, G-T; Dove, M T; Frame, I; Goodwin, A L; Kleese van Dam, K; Marmier, A; Parker, S C; Pruneda, J M; Todorov, I T; Trachenko, K; Tyer, R P; Walker, A M; White, T O H
2009-03-13
We review the work carried out within the eMinerals project to develop eScience solutions that facilitate a new generation of molecular-scale simulation work. Technological developments include integration of compute and data systems, developing of collaborative frameworks and new researcher-friendly tools for grid job submission, XML data representation, information delivery, metadata harvesting and metadata management. A number of diverse science applications will illustrate how these tools are being used for large parameter-sweep studies, an emerging type of study for which the integration of computing, data and collaboration is essential.
The Role of Metadata Standards in EOSDIS Search and Retrieval Applications
NASA Technical Reports Server (NTRS)
Pfister, Robin
1999-01-01
Metadata standards play a critical role in data search and retrieval systems. Metadata tie software to data so the data can be processed, stored, searched, retrieved and distributed. Without metadata these actions are not possible. The process of populating metadata to describe science data is an important service to the end user community so that a user who is unfamiliar with the data, can easily find and learn about a particular dataset before an order decision is made. Once a good set of standards are in place, the accuracy with which data search can be performed depends on the degree to which metadata standards are adhered during product definition. NASA's Earth Observing System Data and Information System (EOSDIS) provides examples of how metadata standards are used in data search and retrieval.
Incorporating ISO Metadata Using HDF Product Designer
NASA Technical Reports Server (NTRS)
Jelenak, Aleksandar; Kozimor, John; Habermann, Ted
2016-01-01
The need to store in HDF5 files increasing amounts of metadata of various complexity is greatly overcoming the capabilities of the Earth science metadata conventions currently in use. Data producers until now did not have much choice but to come up with ad hoc solutions to this challenge. Such solutions, in turn, pose a wide range of issues for data managers, distributors, and, ultimately, data users. The HDF Group is experimenting on a novel approach of using ISO 19115 metadata objects as a catch-all container for all the metadata that cannot be fitted into the current Earth science data conventions. This presentation will showcase how the HDF Product Designer software can be utilized to help data producers include various ISO metadata objects in their products.
NASA Astrophysics Data System (ADS)
Thomas, V. I.; Yu, E.; Acharya, P.; Jaramillo, J.; Chowdhury, F.
2015-12-01
Maintaining and archiving accurate site metadata is critical for seismic network operations. The Advanced National Seismic System (ANSS) Station Information System (SIS) is a repository of seismic network field equipment, equipment response, and other site information. Currently, there are 187 different sensor models and 114 data-logger models in SIS. SIS has a web-based user interface that allows network operators to enter information about seismic equipment and assign response parameters to it. It allows users to log entries for sites, equipment, and data streams. Users can also track when equipment is installed, updated, and/or removed from sites. When seismic equipment configurations change for a site, SIS computes the overall gain of a data channel by combining the response parameters of the underlying hardware components. Users can then distribute this metadata in standardized formats such as FDSN StationXML or dataless SEED. One powerful advantage of SIS is that existing data in the repository can be leveraged: e.g., new instruments can be assigned response parameters from the Incorporated Research Institutions for Seismology (IRIS) Nominal Response Library (NRL), or from a similar instrument already in the inventory, thereby reducing the amount of time needed to determine parameters when new equipment (or models) are introduced into a network. SIS is also useful for managing field equipment that does not produce seismic data (eg power systems, telemetry devices or GPS receivers) and gives the network operator a comprehensive view of site field work. SIS allows users to generate field logs to document activities and inventory at sites. Thus, operators can also use SIS reporting capabilities to improve planning and maintenance of the network. Queries such as how many sensors of a certain model are installed or what pieces of equipment have active problem reports are just a few examples of the type of information that is available to SIS users.
The IRIS Data Management Center: Enabling Access to Observational Time Series Spanning Decades
NASA Astrophysics Data System (ADS)
Ahern, T.; Benson, R.; Trabant, C.
2009-04-01
The Incorporated Research Institutions for Seismology (IRIS) is funded by the National Science Foundation (NSF) to operate the facilities to generate, archive, and distribute seismological data to research communities in the United States and internationally. The IRIS Data Management System (DMS) is responsible for the ingestion, archiving, curation and distribution of these data. The IRIS Data Management Center (DMC) manages data from more than 100 permanent seismic networks, hundreds of temporary seismic deployments as well as data from other geophysical observing networks such as magnetotelluric sensors, ocean bottom sensors, superconducting gravimeters, strainmeters, surface meteorological measurements, and in-situ atmospheric pressure measurements. The IRIS DMC has data from more than 20 different types of sensors. The IRIS DMC manages approximately 100 terabytes of primary observational data. These data are archived in multiple distributed storage systems that insure data availability independent of any single catastrophic failure. Storage systems include both RAID systems of greater than 100 terabytes as well as robotic tape robots of petabyte capacity. IRIS performs routine transcription of the data to new media and storage systems to insure the long-term viability of the scientific data. IRIS adheres to the OAIS Data Preservation Model in most cases. The IRIS data model requires the availability of metadata describing the characteristics and geographic location of sensors before data can be fully archived. IRIS works with the International Federation of Digital Seismographic Networks (FDSN) in the definition and evolution of the metadata. The metadata insures that the data remain useful to both current and future generations of earth scientists. Curation of the metadata and time series is one of the most important activities at the IRIS DMC. Data analysts and an automated quality assurance system monitor the quality of the incoming data. This insures data are of acceptably high quality. The formats and data structures used by the seismological community are esoteric. IRIS and its FDSN partners are developing web services that can transform the data holdings to structures that are more easily used by broader scientific communities. For instance, atmospheric scientists are interested in using global observations of microbarograph data but that community does not understand the methods of applying instrument corrections to the observations. Web processing services under development at IRIS will transform these data in a manner that allows direct use within such analysis tools as MATLAB® already in use by that community. By continuing to develop web-service based methods of data discovery and access, IRIS is enabling broader access to its data holdings. We currently support data discovery using many of the Open Geospatial Consortium (OGC) web mapping services. We are involved in portal technologies to support data discovery and distribution for all data from the EarthScope project. We are working with computer scientists at several universities including the University of Washington as part of a DataNet proposal and we intend to enhance metadata, further develop ontologies, develop a Registry Service to aid in the discovery of data sets and services, and in general improve the semantic interoperability of the data managed at the IRIS DMC. Finally IRIS has been identified as one of four scientific organizations that the External Research Division of Microsoft wants to work with in the development of web services and specifically with the development of a scientific workflow engine. More specific details of current and future developments at the IRIS DMC will be included in this presentation.
Metadata management and semantics in microarray repositories.
Kocabaş, F; Can, T; Baykal, N
2011-12-01
The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.
Sustainable Software Decisions for Long-term Projects (Invited)
NASA Astrophysics Data System (ADS)
Shepherd, A.; Groman, R. C.; Chandler, C. L.; Gaylord, D.; Sun, M.
2013-12-01
Adopting new, emerging technologies can be difficult for established projects that are positioned to exist for years to come. In some cases the challenge lies in the pre-existing software architecture. In others, the challenge lies in the fluctuation of resources like people, time and funding. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) was created in late 2006 by combining the data management offices for the U.S. GLOBEC and U.S. JGOFS programs to publish data for researchers funded by the National Science Foundation (NSF). Since its inception, BCO-DMO has been supporting access and discovery of these data through web-accessible software systems, and the office has worked through many of the challenges of incorporating new technologies into its software systems. From migrating human readable, flat file metadata storage into a relational database, and now, into a content management system (Drupal) to incorporating controlled vocabularies, new technologies can radically affect the existing software architecture. However, through the use of science-driven use cases, effective resource management, and loosely coupled software components, BCO-DMO has been able to adapt its existing software architecture to adopt new technologies. One of the latest efforts at BCO-DMO revolves around applying metadata semantics for publishing linked data in support of data discovery. This effort primarily affects the metadata web interface software at http://bco-dmo.org and the geospatial interface software at http://mapservice.bco-dmo.org/. With guidance from science-driven use cases and consideration of our resources, implementation decisions are made using a strategy to loosely couple the existing software systems to the new technologies. The results of this process led to the use of REST web services and a combination of contributed and custom Drupal modules for publishing BCO-DMO's content using the Resource Description Framework (RDF) via an instance of the Virtuoso Open-Source triplestore.
EUDAT B2FIND : A Cross-Discipline Metadata Service and Discovery Portal
NASA Astrophysics Data System (ADS)
Widmann, Heinrich; Thiemann, Hannes
2016-04-01
The European Data Infrastructure (EUDAT) project aims at a pan-European environment that supports a variety of multiple research communities and individuals to manage the rising tide of scientific data by advanced data management technologies. This led to the establishment of the community-driven Collaborative Data Infrastructure that implements common data services and storage resources to tackle the basic requirements and the specific challenges of international and interdisciplinary research data management. The metadata service B2FIND plays a central role in this context by providing a simple and user-friendly discovery portal to find research data collections stored in EUDAT data centers or in other repositories. For this we store the diverse metadata collected from heterogeneous sources in a comprehensive joint metadata catalogue and make them searchable in an open data portal. The implemented metadata ingestion workflow consists of three steps. First the metadata records - provided either by various research communities or via other EUDAT services - are harvested. Afterwards the raw metadata records are converted and mapped to unified key-value dictionaries as specified by the B2FIND schema. The semantic mapping of the non-uniform, community specific metadata to homogenous structured datasets is hereby the most subtle and challenging task. To assure and improve the quality of the metadata this mapping process is accompanied by • iterative and intense exchange with the community representatives, • usage of controlled vocabularies and community specific ontologies and • formal and semantic validation. Finally the mapped and checked records are uploaded as datasets to the catalogue, which is based on the open source data portal software CKAN. CKAN provides a rich RESTful JSON API and uses SOLR for dataset indexing that enables users to query and search in the catalogue. The homogenization of the community specific data models and vocabularies enables not only the unique presentation of these datasets as tables of field-value pairs but also the faceted, spatial and temporal search in the B2FIND metadata portal. Furthermore the service provides transparent access to the scientific data objects through the given references and identifiers in the metadata. B2FIND offers support for new communities interested in publishing their data within EUDAT. We present here the functionality and the features of the B2FIND service and give an outlook of further developments as interfaces to external libraries and use of Linked Data.
Empowering file-based radio production through media asset management systems
NASA Astrophysics Data System (ADS)
Muylaert, Bjorn; Beckers, Tom
2006-10-01
In recent years, IT-based production and archiving of media has matured to a level which enables broadcasters to switch over from tape- or CD-based to file-based workflows for the production of their radio and television programs. This technology is essential for the future of broadcasters as it provides the flexibility and speed of execution the customer demands by enabling, among others, concurrent access and production, faster than real-time ingest, edit during ingest, centrally managed annotation and quality preservation of media. In terms of automation of program production, the radio department is the most advanced within the VRT, the Flemish broadcaster. Since a couple of years ago, the radio department has been working with digital equipment and producing its programs mainly on standard IT equipment. Historically, the shift from analogue to digital based production has been a step by step process initiated and coordinated by each radio station separately, resulting in a multitude of tools and metadata collections, some of them developed in-house, lacking integration. To make matters worse, each of those stations adopted a slightly different production methodology. The planned introduction of a company-wide Media Asset Management System allows a coordinated overhaul to a unified production architecture. Benefits include the centralized ingest and annotation of audio material and the uniform, integrated (in terms of IT infrastructure) workflow model. Needless to say, the ingest strategy, metadata management and integration with radio production systems play a major role in the level of success of any improvement effort. This paper presents a data model for audio-specific concepts relevant to radio production. It includes an investigation of ingest techniques and strategies. Cooperation with external, professional production tools is demonstrated through a use-case scenario: the integration of an existing, multi-track editing tool with a commercially available Media Asset Management System. This will enable an uncomplicated production chain, with a recognizable look and feel for all system users, regardless of their affiliated radio station, as well as central retrieval and storage of information and metadata.
Semantic Overlays in Educational Content Networks--The hylOs Approach
ERIC Educational Resources Information Center
Engelhardt, Michael; Hildebrand, Arne; Lange, Dagmar; Schmidt, Thomas C.
2006-01-01
Purpose: The paper aims to introduce an educational content management system, Hypermedia Learning Objects System (hylOs), which is fully compliant to the IEEE LOM eLearning object metadata standard. Enabled through an advanced authoring toolset, hylOs allows the definition of instructional overlays of a given eLearning object mesh.…
Implementation of Imaging Technology for Recordkeeping at the World Bank.
ERIC Educational Resources Information Center
Smith, Clive D.
1997-01-01
Describes the evolution of an electronic document management system for the World Bank, including record-keeping components, and how the Pittsburgh requirements for evidence in record keeping were used to evaluate it. Discusses imaging technology for scanning paper records, metadata for retrieval and record keeping, and extending the system to…
Techniques for Efficiently Managing Large Geosciences Data Sets
NASA Astrophysics Data System (ADS)
Kruger, A.; Krajewski, W. F.; Bradley, A. A.; Smith, J. A.; Baeck, M. L.; Steiner, M.; Lawrence, R. E.; Ramamurthy, M. K.; Weber, J.; Delgreco, S. A.; Domaszczynski, P.; Seo, B.; Gunyon, C. A.
2007-12-01
We have developed techniques and software tools for efficiently managing large geosciences data sets. While the techniques were developed as part of an NSF-Funded ITR project that focuses on making NEXRAD weather data and rainfall products available to hydrologists and other scientists, they are relevant to other geosciences disciplines that deal with large data sets. Metadata, relational databases, data compression, and networking are central to our methodology. Data and derived products are stored on file servers in a compressed format. URLs to, and metadata about the data and derived products are managed in a PostgreSQL database. Virtually all access to the data and products is through this database. Geosciences data normally require a number of processing steps to transform the raw data into useful products: data quality assurance, coordinate transformations and georeferencing, applying calibration information, and many more. We have developed the concept of crawlers that manage this scientific workflow. Crawlers are unattended processes that run indefinitely, and at set intervals query the database for their next assignment. A database table functions as a roster for the crawlers. Crawlers perform well-defined tasks that are, except for perhaps sequencing, largely independent from other crawlers. Once a crawler is done with its current assignment, it updates the database roster table, and gets its next assignment by querying the database. We have developed a library that enables one to quickly add crawlers. The library provides hooks to external (i.e., C-language) compiled codes, so that developers can work and contribute independently. Processes called ingesters inject data into the system. The bulk of the data are from a real-time feed using UCAR/Unidata's IDD/LDM software. An exciting recent development is the establishment of a Unidata HYDRO feed that feeds value-added metadata over the IDD/LDM. Ingesters grab the metadata and populate the PostgreSQL tables. These and other concepts we have developed have enabled us to efficiently manage a 70 Tb (and growing) data weather radar data set.
The Modeling and Simulation Catalog for Discovery, Knowledge and Reuse
NASA Technical Reports Server (NTRS)
Stone, George F. III; Greenberg, Brandi; Daehler-Wilking, Richard; Hunt, Steven
2011-01-01
The DoD M&S Steering Committee has noted that the current DoD and Service's modeling and simulation resource repository (MSRR) services are not up-to-date limiting their value to the using communities. However, M&S leaders and managers also determined that the Department needs a functional M&S registry card catalog to facilitate M&S tool and data visibility to support M&S activities across the DoD. The M&S Catalog will discover and access M&S metadata maintained at nodes distributed across DoD networks in a centrally managed, decentralized process that employs metadata collection and management. The intent is to link information stores, precluding redundant location updating. The M&S Catalog uses a standard metadata schemas based on the DoD's Net-Centric Data Strategy Community of Interest metadata specification. The Air Force, Navy and OSD (CAPE) have provided initial information to participating DoD nodes, but plans on the horizon are being made to bring in hundreds of source providers.
Unobtrusive integration of data management with fMRI analysis.
Poliakov, Andrew V; Hertzenberg, Xenia; Moore, Eider B; Corina, David P; Ojemann, George A; Brinkley, James F
2007-01-01
This note describes a software utility, called X-batch which addresses two pressing issues typically faced by functional magnetic resonance imaging (fMRI) neuroimaging laboratories (1) analysis automation and (2) data management. The first issue is addressed by providing a simple batch mode processing tool for the popular SPM software package (http://www.fil.ion. ucl.ac.uk/spm/; Welcome Department of Imaging Neuroscience, London, UK). The second is addressed by transparently recording metadata describing all aspects of the batch job (e.g., subject demographics, analysis parameters, locations and names of created files, date and time of analysis, and so on). These metadata are recorded as instances of an extended version of the Protégé-based Experiment Lab Book ontology created by the Dartmouth fMRI Data Center. The resulting instantiated ontology provides a detailed record of all fMRI analyses performed, and as such can be part of larger systems for neuroimaging data management, sharing, and visualization. The X-batch system is in use in our own fMRI research, and is available for download at http://X-batch.sourceforge.net/.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-21
... development of standardized metadata in hundreds of organizations, and funded numerous implementations of OGC... of emphasis include: Metadata documentation, clearinghouse establishment, framework development...
A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort
NASA Astrophysics Data System (ADS)
Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.
2017-12-01
Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.
Making Metadata Better with CMR and MMT
NASA Technical Reports Server (NTRS)
Gilman, Jason Arthur; Shum, Dana
2016-01-01
Ensuring complete, consistent and high quality metadata is a challenge for metadata providers and curators. The CMR and MMT systems provide providers and curators options to build in metadata quality from the start and also assess and improve the quality of already existing metadata.
Achieving interoperability for metadata registries using comparative object modeling.
Park, Yu Rang; Kim, Ju Han
2010-01-01
Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.
The Importance of Metadata in System Development and IKM
2003-02-01
Defence R& D Canada The Importance of Metadata in System Development and IKM Anthony W. Isenor Technical Memorandum DRDC Atlantic TM 2003-011...Metadata in System Development and IKM Anthony W. Isenor Defence R& D Canada – Atlantic Technical Memorandum DRDC Atlantic TM 2003-011 February... it is important for searches and providing relevant information to the client. A comparison of metadata standards was conducted with emphasis on
Request queues for interactive clients in a shared file system of a parallel computing system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, John M.; Faibish, Sorin
Interactive requests are processed from users of log-in nodes. A metadata server node is provided for use in a file system shared by one or more interactive nodes and one or more batch nodes. The interactive nodes comprise interactive clients to execute interactive tasks and the batch nodes execute batch jobs for one or more batch clients. The metadata server node comprises a virtual machine monitor; an interactive client proxy to store metadata requests from the interactive clients in an interactive client queue; a batch client proxy to store metadata requests from the batch clients in a batch client queue;more » and a metadata server to store the metadata requests from the interactive client queue and the batch client queue in a metadata queue based on an allocation of resources by the virtual machine monitor. The metadata requests can be prioritized, for example, based on one or more of a predefined policy and predefined rules.« less
2011-05-01
iTunes illustrate the difference between the centralized approach of digital library systems and the distributed approach of container file formats...metadata in a container file format. Apple’s iTunes uses a centralized metadata approach and allows users to maintain song metadata in a single...one iTunes library to another the metadata must be copied separately or reentered in the new library. This demonstrates the utility of storing metadata
Hunter, Adam; Dayalan, Saravanan; De Souza, David; Power, Brad; Lorrimar, Rodney; Szabo, Tamas; Nguyen, Thu; O'Callaghan, Sean; Hack, Jeremy; Pyke, James; Nahid, Amsha; Barrero, Roberto; Roessner, Ute; Likic, Vladimir; Tull, Dedreia; Bacic, Antony; McConville, Malcolm; Bellgard, Matthew
2017-01-01
An increasing number of research laboratories and core analytical facilities around the world are developing high throughput metabolomic analytical and data processing pipelines that are capable of handling hundreds to thousands of individual samples per year, often over multiple projects, collaborations and sample types. At present, there are no Laboratory Information Management Systems (LIMS) that are specifically tailored for metabolomics laboratories that are capable of tracking samples and associated metadata from the beginning to the end of an experiment, including data processing and archiving, and which are also suitable for use in large institutional core facilities or multi-laboratory consortia as well as single laboratory environments. Here we present MASTR-MS, a downloadable and installable LIMS solution that can be deployed either within a single laboratory or used to link workflows across a multisite network. It comprises a Node Management System that can be used to link and manage projects across one or multiple collaborating laboratories; a User Management System which defines different user groups and privileges of users; a Quote Management System where client quotes are managed; a Project Management System in which metadata is stored and all aspects of project management, including experimental setup, sample tracking and instrument analysis, are defined, and a Data Management System that allows the automatic capture and storage of raw and processed data from the analytical instruments to the LIMS. MASTR-MS is a comprehensive LIMS solution specifically designed for metabolomics. It captures the entire lifecycle of a sample starting from project and experiment design to sample analysis, data capture and storage. It acts as an electronic notebook, facilitating project management within a single laboratory or a multi-node collaborative environment. This software is being developed in close consultation with members of the metabolomics research community. It is freely available under the GNU GPL v3 licence and can be accessed from, https://muccg.github.io/mastr-ms/.
Next-Generation Search Engines for Information Retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Hook, Leslie A; Palanisamy, Giri
In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is rich, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. Data should be stored in a format that can be retrievable and more importantly it should be in a format that will continue to be accessible as technology changes, such as XML. While general-purpose search engines (such as Google or Bing) are useful formore » finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences. Mercury is open-source toolset, backend built on Java and search capability is supported by the some popular open source search libraries such as SOLR and LUCENE. Mercury harvests the structured metadata and key data from several data providing servers around the world and builds a centralized index. The harvested files are indexed against SOLR search API consistently, so that it can render search capabilities such as simple, fielded, spatial and temporal searches across a span of projects ranging from land, atmosphere, and ocean ecology. Mercury also provides data sharing capabilities using Open Archive Initiatives Protocol for Metadata Handling (OAI-PMH). In this paper we will discuss about the best practices for archiving data and metadata, new searching techniques, efficient ways of data retrieval and information display.« less
Misra, Dharitri; Chen, Siyuan; Thoma, George R
2009-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
CINERGI: Community Inventory of EarthCube Resources for Geoscience Interoperability
NASA Astrophysics Data System (ADS)
Zaslavsky, Ilya; Bermudez, Luis; Grethe, Jeffrey; Gupta, Amarnath; Hsu, Leslie; Lehnert, Kerstin; Malik, Tanu; Richard, Stephen; Valentine, David; Whitenack, Thomas
2014-05-01
Organizing geoscience data resources to support cross-disciplinary data discovery, interpretation, analysis and integration is challenging because of different information models, semantic frameworks, metadata profiles, catalogs, and services used in different geoscience domains, not to mention different research paradigms and methodologies. The central goal of CINERGI, a new project supported by the US National Science Foundation through its EarthCube Building Blocks program, is to create a methodology and assemble a large inventory of high-quality information resources capable of supporting data discovery needs of researchers in a wide range of geoscience domains. The key characteristics of the inventory are: 1) collaboration with and integration of metadata resources from a number of large data facilities; 2) reliance on international metadata and catalog service standards; 3) assessment of resource "interoperability-readiness"; 4) ability to cross-link and navigate data resources, projects, models, researcher directories, publications, usage information, etc.; 5) efficient inclusion of "long-tail" data, which are not appearing in existing domain repositories; 6) data registration at feature level where appropriate, in addition to common dataset-level registration, and 7) integration with parallel EarthCube efforts, in particular focused on EarthCube governance, information brokering, service-oriented architecture design and management of semantic information. We discuss challenges associated with accomplishing CINERGI goals, including defining the inventory scope; managing different granularity levels of resource registration; interaction with search systems of domain repositories; explicating domain semantics; metadata brokering, harvesting and pruning; managing provenance of the harvested metadata; and cross-linking resources based on the linked open data (LOD) approaches. At the higher level of the inventory, we register domain-wide resources such as domain catalogs, vocabularies, information models, data service specifications, identifier systems, and assess their conformance with international standards (such as those adopted by ISO and OGC, and used by INSPIRE) or de facto community standards using, in part, automatic validation techniques. The main level in CINERGI leverages a metadata aggregation platform (currently Geoportal Server) to organize harvested resources from multiple collections and contributed by community members during EarthCube end-user domain workshops or suggested online. The latter mechanism uses the SciCrunch toolkit originally developed within the Neuroscience Information Framework (NIF) project and now being extended to other communities. The inventory is designed to support requests such as "Find resources with theme X in geographic area S", "Find datasets with subject Y using query concept expansion", "Find geographic regions having data of type Z", "Find datasets that contain property P". With the added LOD support, additional types of requests, such as "Find example implementations of specification X", "Find researchers who have worked in Domain X, dataset Y, location L", "Find resources annotated by person X", will be supported. Project's website (http://workspace.earthcube.org/cinergi) provides access to the initial resource inventory, a gallery of EarthCube researchers, collections of geoscience models, metadata entry forms, and other software modules and inventories being integrated into the CINERGI system. Support from the US National Science Foundation under award NSF ICER-1343816 is gratefully acknowledged.
Examining the Role of Metadata in Testing IED Detection Systems
2009-09-01
energy management, crop assessment, weather forecasting, disaster alerts, and endangered species assessment [ Biagioni and Bridges 2002; Main- waring et...accessed June 10, 2008). Biagioni , E. and K. Bridges. 2002. The application of remote sensor technology to assist the recovery of rare endangered species
OntoFire: an ontology-based geo-portal for wildfires
NASA Astrophysics Data System (ADS)
Kalabokidis, K.; Athanasis, N.; Vaitis, M.
2011-12-01
With the proliferation of the geospatial technologies on the Internet, the role of geo-portals (i.e. gateways to Spatial Data Infrastructures) in the area of wildfires management emerges. However, keyword-based techniques often frustrate users when looking for data of interest in geo-portal environments, while little attention has been paid to shift from the conventional keyword-based to navigation-based mechanisms. The presented OntoFire system is an ontology-based geo-portal about wildfires. Through the proposed navigation mechanisms, the relationships between the data can be discovered, which would otherwise not be possible when using conventional querying techniques alone. End users can use the browsing interface to find resources of interest by using the navigation mechanisms provided. Data providers can use the publishing interface to submit new metadata, modify metadata or removing metadata in/from the catalogue. The proposed approach can improve the discovery of valuable information that is necessary to set priorities for disaster mitigation and prevention strategies. OntoFire aspires to be a focal point of integration and management of a very large amount of information, contributing in this way to the dissemination of knowledge and to the preparedness of the operational stakeholders.
Metadata and Buckets in the Smart Object, Dumb Archive (SODA) Model
NASA Technical Reports Server (NTRS)
Nelson, Michael L.; Maly, Kurt; Croom, Delwin R., Jr.; Robbins, Steven W.
2004-01-01
We present the Smart Object, Dumb Archive (SODA) model for digital libraries (DLs), and discuss the role of metadata in SODA. The premise of the SODA model is to "push down" many of the functionalities generally associated with archives into the data objects themselves. Thus the data objects become "smarter", and the archives "dumber". In the SODA model, archives become primarily set managers, and the objects themselves negotiate and handle presentation, enforce terms and conditions, and perform data content management. Buckets are our implementation of smart objects, and da is our reference implementation for dumb archives. We also present our approach to metadata translation for buckets.
Standard formatted data units-control authority operations
NASA Technical Reports Server (NTRS)
1991-01-01
The purpose of this document is to illustrate a Control Authority's (CA) possible operation. The document is an interpretation and expansion of the concept found in the CA Procedures Recommendation. The CA is described in terms of the functions it performs for the management and control of data descriptions (metadata). Functions pertaining to the organization of Member Agency Control Authority Offices (MACAOs) (e.g., creating and disbanding) are not discussed. The document also provides an illustrative operational view of a CA through scenarios describing interaction between those roles involved in collecting, controlling, and accessing registered metadata. The roles interacting with the CA are identified by their actions in requesting and responding to requests for metadata, and by the type of information exchanged. The scenarios and examples presented in this document are illustrative only. They represent possible interactions supported by either a manual or automated system. These scenarios identify requirements for an automated system. These requirements are expressed by identifying the information to be exchanged and the services that may be provided by a CA for that exchange.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencingmore » projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.« less
Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time
NASA Astrophysics Data System (ADS)
Ulbricht, Damian; Klump, Jens; Bertelmann, Roland
2013-04-01
Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures REST-interface to store versioned dataset files and metadata in a XML-format. The software is able to administrate more than one eSciDoc metadata record per item and thus allows the description of a dataset according to its context. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and, at the same time, make use of contextual information available in a project setting. Access rights can be adjusted to set visibility of datasets to the required degree of openness. Metadata from separate instances of panMetaDocs can be syndicated to portals through RSS and OAI-PMH interfaces. The application architecture presented here allows storing file-based datasets and describe these datasets with any number of metadata schemas, depending on the intended use case. Data and metadata are stored in the same entity (eSciDoc items) and are managed by a software tool through the eSciDoc REST interface - in this case the application is panMetaDocs. Other software may re-use the produced items and modify the appropriate metadata records by accessing the web API of the eSciDoc data infrastructure. For presentation of the datasets in a web browser we are not bound to panMetaDocs. This is done by stylesheet transformation of the eSciDoc-item. [1] http://www.datacite.org [2] http://www.escidoc.org , eSciDoc, FIZ Karlruhe, Germany [3] http://panmetadocs.sf.net , panMetaDocs, GFZ Potsdam, Germany [4] http://metaworks.pangaea.de , panMetaWorks, Dr. R. Huber, MARUM, Univ. Bremen, Germany
NASA Technical Reports Server (NTRS)
Andres, Vince; Walter, David; Hallal, Charles; Jones, Helene; Callac, Chris
2004-01-01
The SSC Multimedia Archive is an automated electronic system to manage images, acquired both by film and digital cameras, for the Public Affairs Office (PAO) at Stennis Space Center (SSC). Previously, the image archive was based on film photography and utilized a manual system that, by today s standards, had become inefficient and expensive. Now, the SSC Multimedia Archive, based on a server at SSC, contains both catalogs and images for pictures taken both digitally and with a traditional, film-based camera, along with metadata about each image. After a "shoot," a photographer downloads the images into the database. Members of the PAO can use a Web-based application to search, view and retrieve images, approve images for publication, and view and edit metadata associated with the images. Approved images are archived and cross-referenced with appropriate descriptions and information. Security is provided by allowing administrators to explicitly grant access privileges to personnel to only access components of the system that they need to (i.e., allow only photographers to upload images, only PAO designated employees may approve images).
A spatial data handling system for retrieval of images by unrestricted regions of user interest
NASA Technical Reports Server (NTRS)
Dorfman, Erik; Cromp, Robert F.
1992-01-01
The Intelligent Data Management (IDM) project at NASA/Goddard Space Flight Center has prototyped an Intelligent Information Fusion System (IIFS), which automatically ingests metadata from remote sensor observations into a large catalog which is directly queryable by end-users. The greatest challenge in the implementation of this catalog was supporting spatially-driven searches, where the user has a possible complex region of interest and wishes to recover those images that overlap all or simply a part of that region. A spatial data management system is described, which is capable of storing and retrieving records of image data regardless of their source. This system was designed and implemented as part of the IIFS catalog. A new data structure, called a hypercylinder, is central to the design. The hypercylinder is specifically tailored for data distributed over the surface of a sphere, such as satellite observations of the Earth or space. Operations on the hypercylinder are regulated by two expert systems. The first governs the ingest of new metadata records, and maintains the efficiency of the data structure as it grows. The second translates, plans, and executes users' spatial queries, performing incremental optimization as partial query results are returned.
Interpreting the ASTM 'content standard for digital geospatial metadata'
Nebert, Douglas D.
1996-01-01
ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.
Scalable Data Mining and Archiving for the Square Kilometre Array
NASA Astrophysics Data System (ADS)
Jones, D. L.; Mattmann, C. A.; Hart, A. F.; Lazio, J.; Bennett, T.; Wagstaff, K. L.; Thompson, D. R.; Preston, R.
2011-12-01
As the technologies for remote observation improve, the rapid increase in the frequency and fidelity of those observations translates into an avalanche of data that is already beginning to eclipse the resources, both human and technical, of the institutions and facilities charged with managing the information. Common data management tasks like cataloging both data itself and contextual meta-data, creating and maintaining scalable permanent archive, and making data available on-demand for research present significant software engineering challenges when considered at the scales of modern multi-national scientific enterprises such as the upcoming Square Kilometre Array project. The NASA Jet Propulsion Laboratory (JPL), leveraging internal research and technology development funding, has begun to explore ways to address the data archiving and distribution challenges with a number of parallel activities involving collaborations with the EVLA and ALMA teams at the National Radio Astronomy Observatory (NRAO), and members of the Square Kilometre Array South Africa team. To date, we have leveraged the Apache OODT Process Control System framework and its catalog and archive service components that provide file management, workflow management, resource management as core web services. A client crawler framework ingests upstream data (e.g., EVLA raw directory output), identifies its MIME type and automatically extracts relevant metadata including temporal bounds, and job-relevant/processing information. A remote content acquisition (pushpull) service is responsible for staging remote content and handing it off to the crawler framework. A science algorithm wrapper (called CAS-PGE) wraps underlying code including CASApy programs for the EVLA, such as Continuum Imaging and Spectral Line Cube generation, executes the algorithm, and ingests its output (along with relevant extracted metadata). In addition to processing, the Process Control System has been leveraged to provide data curation and automatic ingestion for the MeerKAT/KAT-7 precursor instrument in South Africa, helping to catalog and archive correlator and sensor output from KAT-7, and to make the information available for downstream science analysis. These efforts, supported by the increasing availability of high-quality open source software, represent a concerted effort to seek a cost-conscious methodology for maintaining the integrity of observational data from the upstream instrument to the archive, and at the same time ensuring that the data, with its richly annotated catalog of meta-data, remains a viable resource for research into the future.
Ku, Ho Suk; Kim, Sungho; Kim, HyeHyeon; Chung, Hee-Joon; Park, Yu Rang; Kim, Ju Han
2014-04-01
Health Avatar Beans was for the management of chronic kidney disease and end-stage renal disease (ESRD). This article is about the DialysisNet system in Health Avatar Beans for the seamless management of ESRD based on the personal health record. For hemodialysis data modeling, we identified common data elements for hemodialysis information (CDEHI). We used ASTM continuity of care record (CCR) and ISO/IEC 11179 for the compliance method with a standard model for the CDEHI. According to the contents of the ASTM CCR, we mapped the CDHEI to the contents and created the metadata from that. It was transformed and parsed into the database and verified according to the ASTM CCR/XML schema definition (XSD). DialysisNet was created as an iPad application. The contents of the CDEHI were categorized for effective management. For the evaluation of information transfer, we used CarePlatform, which was developed for data access. The metadata of CDEHI in DialysisNet was exchanged by the CarePlatform with semantic interoperability. The CDEHI was separated into a content list for individual patient data, a contents list for hemodialysis center data, consultation and transfer form, and clinical decision support data. After matching to the CCR, the CDEHI was transformed to metadata, and it was transformed to XML and proven according to the ASTM CCR/XSD. DialysisNet has specific consideration of visualization, graphics, images, statistics, and database. We created the DialysisNet application, which can integrate and manage data sources for hemodialysis information based on CCR standards.
Metadata to Describe Genomic Information.
Delgado, Jaime; Naro, Daniel; Llorente, Silvia; Gelpí, Josep Lluís; Royo, Romina
2018-01-01
Interoperable metadata is key for the management of genomic information. We propose a flexible approach that we contribute to the standardization by ISO/IEC of a new format for efficient and secure compressed storage and transmission of genomic information.
Misra, Dharitri; Chen, Siyuan; Thoma, George R.
2010-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
Critical Issues of Web-Enabled Technologies in Modern Organizations.
ERIC Educational Resources Information Center
Khosrow-Pour, Mehdi; Herman, Nancy
2001-01-01
Discusses results of a Delphi study that explored issues related to the utilization and management of Web-enabled technologies by modern organizations. Topics include bandwidth restrictions; security; data integrity; inadequate search facilities; system incompatibilities; failure to adhere to standards; email; use of metadata; privacy and…
Extended Relation Metadata for SCORM-Based Learning Content Management Systems
ERIC Educational Resources Information Center
Lu, Eric Jui-Lin; Horng, Gwoboa; Yu, Chia-Ssu; Chou, Ling-Ying
2010-01-01
To increase the interoperability and reusability of learning objects, Advanced Distributed Learning Initiative developed a model called Content Aggregation Model (CAM) to describe learning objects and express relationships between learning objects. However, the suggested relations defined in the CAM can only describe structure-oriented…
Data Management Rubric for Video Data in Organismal Biology.
Brainerd, Elizabeth L; Blob, Richard W; Hedrick, Tyson L; Creamer, Andrew T; Müller, Ulrike K
2017-07-01
Standards-based data management facilitates data preservation, discoverability, and access for effective data reuse within research groups and across communities of researchers. Data sharing requires community consensus on standards for data management, such as storage and formats for digital data preservation, metadata (i.e., contextual data about the data) that should be recorded and stored, and data access. Video imaging is a valuable tool for measuring time-varying phenotypes in organismal biology, with particular application for research in functional morphology, comparative biomechanics, and animal behavior. The raw data are the videos, but videos alone are not sufficient for scientific analysis. Nearly endless videos of animals can be found on YouTube and elsewhere on the web, but these videos have little value for scientific analysis because essential metadata such as true frame rate, spatial calibration, genus and species, weight, age, etc. of organisms, are generally unknown. We have embarked on a project to build community consensus on video data management and metadata standards for organismal biology research. We collected input from colleagues at early stages, organized an open workshop, "Establishing Standards for Video Data Management," at the Society for Integrative and Comparative Biology meeting in January 2017, and then collected two more rounds of input on revised versions of the standards. The result we present here is a rubric consisting of nine standards for video data management, with three levels within each standard: good, better, and best practices. The nine standards are: (1) data storage; (2) video file formats; (3) metadata linkage; (4) video data and metadata access; (5) contact information and acceptable use; (6) camera settings; (7) organism(s); (8) recording conditions; and (9) subject matter/topic. The first four standards address data preservation and interoperability for sharing, whereas standards 5-9 establish minimum metadata standards for organismal biology video, and suggest additional metadata that may be useful for some studies. This rubric was developed with substantial input from researchers and students, but still should be viewed as a living document that should be further refined and updated as technology and research practices change. The audience for these standards includes researchers, journals, and granting agencies, and also the developers and curators of databases that may contribute to video data sharing efforts. We offer this project as an example of building community consensus for data management, preservation, and sharing standards, which may be useful for future efforts by the organismal biology research community. © The Author 2017. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology.
Data Management Rubric for Video Data in Organismal Biology
Brainerd, Elizabeth L.; Blob, Richard W.; Hedrick, Tyson L.; Creamer, Andrew T.; Müller, Ulrike K.
2017-01-01
Synopsis Standards-based data management facilitates data preservation, discoverability, and access for effective data reuse within research groups and across communities of researchers. Data sharing requires community consensus on standards for data management, such as storage and formats for digital data preservation, metadata (i.e., contextual data about the data) that should be recorded and stored, and data access. Video imaging is a valuable tool for measuring time-varying phenotypes in organismal biology, with particular application for research in functional morphology, comparative biomechanics, and animal behavior. The raw data are the videos, but videos alone are not sufficient for scientific analysis. Nearly endless videos of animals can be found on YouTube and elsewhere on the web, but these videos have little value for scientific analysis because essential metadata such as true frame rate, spatial calibration, genus and species, weight, age, etc. of organisms, are generally unknown. We have embarked on a project to build community consensus on video data management and metadata standards for organismal biology research. We collected input from colleagues at early stages, organized an open workshop, “Establishing Standards for Video Data Management,” at the Society for Integrative and Comparative Biology meeting in January 2017, and then collected two more rounds of input on revised versions of the standards. The result we present here is a rubric consisting of nine standards for video data management, with three levels within each standard: good, better, and best practices. The nine standards are: (1) data storage; (2) video file formats; (3) metadata linkage; (4) video data and metadata access; (5) contact information and acceptable use; (6) camera settings; (7) organism(s); (8) recording conditions; and (9) subject matter/topic. The first four standards address data preservation and interoperability for sharing, whereas standards 5–9 establish minimum metadata standards for organismal biology video, and suggest additional metadata that may be useful for some studies. This rubric was developed with substantial input from researchers and students, but still should be viewed as a living document that should be further refined and updated as technology and research practices change. The audience for these standards includes researchers, journals, and granting agencies, and also the developers and curators of databases that may contribute to video data sharing efforts. We offer this project as an example of building community consensus for data management, preservation, and sharing standards, which may be useful for future efforts by the organismal biology research community. PMID:28881939
Integrated platform and API for electrophysiological data
Sobolev, Andrey; Stoewer, Adrian; Leonhardt, Aljoscha; Rautenberg, Philipp L.; Kellner, Christian J.; Garbers, Christian; Wachtler, Thomas
2014-01-01
Recent advancements in technology and methodology have led to growing amounts of increasingly complex neuroscience data recorded from various species, modalities, and levels of study. The rapid data growth has made efficient data access and flexible, machine-readable data annotation a crucial requisite for neuroscientists. Clear and consistent annotation and organization of data is not only an important ingredient for reproducibility of results and re-use of data, but also essential for collaborative research and data sharing. In particular, efficient data management and interoperability requires a unified approach that integrates data and metadata and provides a common way of accessing this information. In this paper we describe GNData, a data management platform for neurophysiological data. GNData provides a storage system based on a data representation that is suitable to organize data and metadata from any electrophysiological experiment, with a functionality exposed via a common application programming interface (API). Data representation and API structure are compatible with existing approaches for data and metadata representation in neurophysiology. The API implementation is based on the Representational State Transfer (REST) pattern, which enables data access integration in software applications and facilitates the development of tools that communicate with the service. Client libraries that interact with the API provide direct data access from computing environments like Matlab or Python, enabling integration of data management into the scientist's experimental or analysis routines. PMID:24795616
Integrated platform and API for electrophysiological data.
Sobolev, Andrey; Stoewer, Adrian; Leonhardt, Aljoscha; Rautenberg, Philipp L; Kellner, Christian J; Garbers, Christian; Wachtler, Thomas
2014-01-01
Recent advancements in technology and methodology have led to growing amounts of increasingly complex neuroscience data recorded from various species, modalities, and levels of study. The rapid data growth has made efficient data access and flexible, machine-readable data annotation a crucial requisite for neuroscientists. Clear and consistent annotation and organization of data is not only an important ingredient for reproducibility of results and re-use of data, but also essential for collaborative research and data sharing. In particular, efficient data management and interoperability requires a unified approach that integrates data and metadata and provides a common way of accessing this information. In this paper we describe GNData, a data management platform for neurophysiological data. GNData provides a storage system based on a data representation that is suitable to organize data and metadata from any electrophysiological experiment, with a functionality exposed via a common application programming interface (API). Data representation and API structure are compatible with existing approaches for data and metadata representation in neurophysiology. The API implementation is based on the Representational State Transfer (REST) pattern, which enables data access integration in software applications and facilitates the development of tools that communicate with the service. Client libraries that interact with the API provide direct data access from computing environments like Matlab or Python, enabling integration of data management into the scientist's experimental or analysis routines.
Review of Spatial-Database System Usability: Recommendations for the ADDNS Project
2007-12-01
basic GIS background information , with a closer look at spatial databases. A GIS is also a computer- based system designed to capture, manage...foundation for deploying enterprise-wide spatial information systems . According to Oracle® [18], it enables accurate delivery of location- based services...Toronto TR 2007-141 Lanter, D.P. (1991). Design of a lineage- based meta-data base for GIS. Cartography and Geographic Information Systems , 18
Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics.
Good, Benjamin M; Tennis, Joseph T; Wilkinson, Mark D
2009-09-25
Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products. We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting. The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in the life sciences, more documents need to be tagged and more tags are needed for each document. These issues may be addressed both by finding ways to attract more users to current systems and by creating new user interfaces that encourage more collectively useful individual tagging behaviour.
A New Data Management System for Biological and Chemical Oceanography
NASA Astrophysics Data System (ADS)
Groman, R. C.; Chandler, C.; Allison, D.; Glover, D. M.; Wiebe, P. H.
2007-12-01
The Biological and Chemical Oceanography Data Management Office (BCO-DMO) was created to serve PIs principally funded by NSF to conduct marine chemical and ecological research. The new office is dedicated to providing open access to data and information developed in the course of scientific research on short and intermediate time-frames. The data management system developed in support of U.S. JGOFS and U.S. GLOBEC programs is being modified to support the larger scope of the BCO-DMO effort, which includes ultimately providing a way to exchange data with other data systems. The open access system is based on a philosophy of data stewardship, support for existing and evolving data standards, and use of public domain software. The DMO staff work closely with originating PIs to manage data gathered as part of their individual programs. In the new BCO-DMO data system, project and data set metadata records designed to support re-use of the data are stored in a relational database (MySQL) and the data are stored in or made accessible by the JGOFS/GLOBEC object- oriented, relational, data management system. Data access will be provided via any standard Web browser client user interface through a GIS application (Open Source, OGC-compliant MapServer), a directory listing from the data holdings catalog, or a custom search engine that facilitates data discovery. In an effort to maximize data system interoperability, data will also be available via Web Services; and data set descriptions will be generated to comply with a variety of metadata content standards. The office is located at the Woods Hole Oceanographic Institution and web access is via http://www.bco-dmo.org.
panMetaDocs and DataSync - providing a convenient way to share and publish research data
NASA Astrophysics Data System (ADS)
Ulbricht, D.; Klump, J. F.
2013-12-01
In recent years research institutions, geological surveys and funding organizations started to build infrastructures to facilitate the re-use of research data from previous work. At present, several intermeshed activities are coordinated to make data systems of the earth sciences interoperable and recorded data discoverable. Driven by governmental authorities, ISO19115/19139 emerged as metadata standards for discovery of data and services. Established metadata transport protocols like OAI-PMH and OGC-CSW are used to disseminate metadata to data portals. With the persistent identifiers like DOI and IGSN research data and corresponding physical samples can be given unambiguous names and thus become citable. In summary, these activities focus primarily on 'ready to give away'-data, already stored in an institutional repository and described with appropriate metadata. Many datasets are not 'born' in this state but are produced in small and federated research projects. To make access and reuse of these 'small data' easier, these data should be centrally stored and version controlled from the very beginning of activities. We developed DataSync [1] as supplemental application to the panMetaDocs [2] data exchange platform as a data management tool for small science projects. DataSync is a JAVA-application that runs on a local computer and synchronizes directory trees into an eSciDoc-repository [3] by creating eSciDoc-objects via eSciDocs' REST API. DataSync can be installed on multiple computers and is in this way able to synchronize files of a research team over the internet. XML Metadata can be added as separate files that are managed together with data files as versioned eSciDoc-objects. A project-customized instance of panMetaDocs is provided to show a web-based overview of the previously uploaded file collection and to allow further annotation with metadata inside the eSciDoc-repository. PanMetaDocs is a PHP based web application to assist the creation of metadata in any XML-based metadata schema. To reduce manual entries of metadata to a minimum and make use of contextual information in a project setting, metadata fields can be populated with static or dynamic content. Access rights can be defined to control visibility and access to stored objects. Notifications about recently updated datasets are available by RSS and e-mail and the entire inventory can be harvested via OAI-PMH. panMetaDocs is optimized to be harvested by panFMP [4]. panMetaDocs is able to mint dataset DOIs though DataCite and uses eSciDocs' REST API to transfer eSciDoc-objects from a non-public 'pending'-status to the published status 'released', which makes data and metadata of the published object available worldwide through the internet. The application scenario presented here shows the adoption of open source applications to data sharing and publication of data. An eSciDoc-repository is used as storage for data and metadata. DataSync serves as a file ingester and distributor, whereas panMetaDocs' main function is to annotate the dataset files with metadata to make them ready for publication and sharing with your own team, or with the scientific community.
Online Metadata Directories: A way of preserving, sharing and discovering scientific information
NASA Technical Reports Server (NTRS)
Meaux, M.
2005-01-01
The Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth Science data and provides data holders a means to advertise their data to the community through its portals, i.e. online customized subset metadata directories. These directories are effectively serving communities like the Joint Committee on Antarctic Data Management (JCADM), the Global Observing System Information Center (GOSIC), and the Global Ocean Ecosystems Dynamic Program (GLOBEC) by increasing the visibility of their data holding. The purpose of the Gulf of Maine Ocean Data Partnership (GoMODP) is to "promote and coordinate the sharing, linking, electronic dissemination, and use of data on the Gulf of Maine region". The participants have decided that a "coordinated effort is needed to enable users throughout the Gulf of Maine region and beyond to discover and put to use the vast and growing quantities of data in their respective databases". GoMODP members have invited the GCMD to discuss potential collaborations associated with this effort. The presentation will focus on the use of the GCMD s metadata directory as a powerful tool for data discovery and sharing. An overview of the directory and its metadata authoring tools will be given.
DocML: A Digital Library of University Data.
ERIC Educational Resources Information Center
Papadakis, Ioannis; Karakoidas, Vassileios; Chrissikopoulos, Vassileios
2002-01-01
Describes DocML, a Web-based digital library of university data that is used to build a system capable of preserving and managing student assignments. Topics include requirements for a digital library of university data; metadata and XML; three-tier architecture; user interface; searching; browsing; content delivery; and administrative issues.…
Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center
Stathias, Vasileios; Koleti, Amar; Vidović, Dušica; Cooper, Daniel J.; Jagodnik, Kathleen M.; Terryn, Raymond; Forlin, Michele; Chung, Caty; Torre, Denis; Ayad, Nagi; Medvedovic, Mario; Ma'ayan, Avi; Pillai, Ajay; Schürer, Stephan C.
2018-01-01
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles. PMID:29917015
Data Curation for the Exploitation of Large Earth Observation Products Databases - The MEA system
NASA Astrophysics Data System (ADS)
Mantovani, Simone; Natali, Stefano; Barboni, Damiano; Cavicchi, Mario; Della Vecchia, Andrea
2014-05-01
National Space Agencies under the umbrella of the European Space Agency are performing a strong activity to handle and provide solutions to Big Data and related knowledge (metadata, software tools and services) management and exploitation. The continuously increasing amount of long-term and of historic data in EO facilities in the form of online datasets and archives, the incoming satellite observation platforms that will generate an impressive amount of new data and the new EU approach on the data distribution policy make necessary to address technologies for the long-term management of these data sets, including their consolidation, preservation, distribution, continuation and curation across multiple missions. The management of long EO data time series of continuing or historic missions - with more than 20 years of data available already today - requires technical solutions and technologies which differ considerably from the ones exploited by existing systems. Several tools, both open source and commercial, are already providing technologies to handle data and metadata preparation, access and visualization via OGC standard interfaces. This study aims at describing the Multi-sensor Evolution Analysis (MEA) system and the Data Curation concept as approached and implemented within the ASIM and EarthServer projects, funded by the European Space Agency and the European Commission, respectively.
Ismail, Mahmoud; Philbin, James
2015-04-01
The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies' metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata.
Ismail, Mahmoud; Philbin, James
2015-01-01
Abstract. The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies’ metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata. PMID:26158117
NetCDF4/HDF5 and Linked Data in the Real World - Enriching Geoscientific Metadata without Bloat
NASA Astrophysics Data System (ADS)
Ip, Alex; Car, Nicholas; Druken, Kelsey; Poudjom-Djomani, Yvette; Butcher, Stirling; Evans, Ben; Wyborn, Lesley
2017-04-01
NetCDF4 has become the dominant generic format for many forms of geoscientific data, leveraging (and constraining) the versatile HDF5 container format, while providing metadata conventions for interoperability. However, the encapsulation of detailed metadata within each file can lead to metadata "bloat", and difficulty in maintaining consistency where metadata is replicated to multiple locations. Complex conceptual relationships are also difficult to represent in simple key-value netCDF metadata. Linked Data provides a practical mechanism to address these issues by associating the netCDF files and their internal variables with complex metadata stored in Semantic Web vocabularies and ontologies, while complying with and complementing existing metadata conventions. One of the stated objectives of the netCDF4/HDF5 formats is that they should be self-describing: containing metadata sufficient for cataloguing and using the data. However, this objective can be regarded as only partially-met where details of conventions and definitions are maintained externally to the data files. For example, one of the most widely used netCDF community standards, the Climate and Forecasting (CF) Metadata Convention, maintains standard vocabularies for a broad range of disciplines across the geosciences, but this metadata is currently neither readily discoverable nor machine-readable. We have previously implemented useful Linked Data and netCDF tooling (ncskos) that associates netCDF files, and individual variables within those files, with concepts in vocabularies formulated using the Simple Knowledge Organization System (SKOS) ontology. NetCDF files contain Uniform Resource Identifier (URI) links to terms represented as SKOS Concepts, rather than plain-text representations of those terms, so we can use simple, standardised web queries to collect and use rich metadata for the terms from any Linked Data-presented SKOS vocabulary. Geoscience Australia (GA) manages a large volume of diverse geoscientific data, much of which is being translated from proprietary formats to netCDF at NCI Australia. This data is made available through the NCI National Environmental Research Data Interoperability Platform (NERDIP) for programmatic access and interdisciplinary analysis. The netCDF files contain both scientific data variables (e.g. gravity, magnetic or radiometric values), but also domain-specific operational values (e.g. specific instrument parameters) best described fully in formal vocabularies. Our ncskos codebase provides access to multiple stores of detailed external metadata in a standardised fashion. Geophysical datasets are generated from a "survey" event, and GA maintains corporate databases of all surveys and their associated metadata. It is impractical to replicate the full source survey metadata into each netCDF dataset so, instead, we link the netCDF files to survey metadata using public Linked Data URIs. These URIs link to Survey class objects which we model as a subclass of Activity objects as defined by the PROV Ontology, and we provide URI resolution for them via a custom Linked Data API which draws current survey metadata from GA's in-house databases. We have demonstrated that Linked Data is a practical way to associate netCDF data with detailed, external metadata. This allows us to ensure that catalogued metadata is kept consistent with metadata points-of-truth, and we can infer complex conceptual relationships not possible with netCDF key-value attributes alone.
SAS- Semantic Annotation Service for Geoscience resources on the web
NASA Astrophysics Data System (ADS)
Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.
2015-12-01
There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
Development of an Integrated Biospecimen Database among the Regional Biobanks in Korea.
Park, Hyun Sang; Cho, Hune; Kim, Hwa Sun
2016-04-01
This study developed an integrated database for 15 regional biobanks that provides large quantities of high-quality bio-data to researchers to be used for the prevention of disease, for the development of personalized medicines, and in genetics studies. We collected raw data, managed independently by 15 regional biobanks, for database modeling and analyzed and defined the metadata of the items. We also built a three-step (high, middle, and low) classification system for classifying the item concepts based on the metadata. To generate clear meanings of the items, clinical items were defined using the Systematized Nomenclature of Medicine Clinical Terms, and specimen items were defined using the Logical Observation Identifiers Names and Codes. To optimize database performance, we set up a multi-column index based on the classification system and the international standard code. As a result of subdividing 7,197,252 raw data items collected, we refined the metadata into 1,796 clinical items and 1,792 specimen items. The classification system consists of 15 high, 163 middle, and 3,588 low class items. International standard codes were linked to 69.9% of the clinical items and 71.7% of the specimen items. The database consists of 18 tables based on a table from MySQL Server 5.6. As a result of the performance evaluation, the multi-column index shortened query time by as much as nine times. The database developed was based on an international standard terminology system, providing an infrastructure that can integrate the 7,197,252 raw data items managed by the 15 regional biobanks. In particular, it resolved the inevitable interoperability issues in the exchange of information among the biobanks, and provided a solution to the synonym problem, which arises when the same concept is expressed in a variety of ways.
Seeking the Path to Metadata Nirvana
NASA Astrophysics Data System (ADS)
Graybeal, J.
2008-12-01
Scientists have always found reusing other scientists' data challenging. Computers did not fundamentally change the problem, but enabled more and larger instances of it. In fact, by removing human mediation and time delays from the data sharing process, computers emphasize the contextual information that must be exchanged in order to exchange and reuse data. This requirement for contextual information has two faces: "interoperability" when talking about systems, and "the metadata problem" when talking about data. As much as any single organization, the Marine Metadata Interoperability (MMI) project has been tagged with the mission "Solve the metadata problem." Of course, if that goal is achieved, then sustained, interoperable data systems for interdisciplinary observing networks can be easily built -- pesky metadata differences, like which protocol to use for data exchange, or what the data actually measures, will be a thing of the past. Alas, as you might imagine, there will always be complexities and incompatibilities that are not addressed, and data systems that are not interoperable, even within a science discipline. So should we throw up our hands and surrender to the inevitable? Not at all. Rather, we try to minimize metadata problems as much as we can. In this we increasingly progress, despite natural forces that pull in the other direction. Computer systems let us work with more complexity, build community knowledge and collaborations, and preserve and publish our progress and (dis-)agreements. Funding organizations, science communities, and technologists see the importance interoperable systems and metadata, and direct resources toward them. With the new approaches and resources, projects like IPY and MMI can simultaneously define, display, and promote effective strategies for sustainable, interoperable data systems. This presentation will outline the role metadata plays in durable interoperable data systems, for better or worse. It will describe times when "just choosing a standard" can work, and when it probably won't work. And it will point out signs that suggest a metadata storm is coming to your community project, and how you might avoid it. From these lessons we will seek a path to producing interoperable, interdisciplinary, metadata-enlightened environment observing systems.
NASA Astrophysics Data System (ADS)
Leibovici, D. G.; Pourabdollah, A.; Jackson, M.
2011-12-01
Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and ARM
NASA Astrophysics Data System (ADS)
Crow, M. C.; Devarakonda, R.; Killeffer, T.; Hook, L.; Boden, T.; Wullschleger, S.
2017-12-01
Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This poster describes tools being used in several projects at Oak Ridge National Laboratory (ORNL), with a focus on the U.S. Department of Energy's Next Generation Ecosystem Experiment in the Arctic (NGEE Arctic) and Atmospheric Radiation Measurements (ARM) project, and their usage at different stages of the data lifecycle. The Online Metadata Editor (OME) is used for the documentation and archival stages while a Data Search tool supports indexing, cataloging, and searching. The NGEE Arctic OME Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload while adhering to standard metadata formats. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The Data Search Tool conveniently displays each data record in a thumbnail containing the title, source, and date range, and features a quick view of the metadata associated with that record, as well as a direct link to the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for geo-searching. These tools are supported by the Mercury [2] consortium (funded by DOE, NASA, USGS, and ARM) and developed and managed at Oak Ridge National Laboratory. Mercury is a set of tools for collecting, searching, and retrieving metadata and data. Mercury collects metadata from contributing project servers, then indexes the metadata to make it searchable using Apache Solr, and provides access to retrieve it from the web page. Metadata standards that Mercury supports include: XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115.
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras.
Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki
2016-06-24
Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system.
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras
Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki
2016-01-01
Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system. PMID:27347961
Arctic Observing Network Data Management: Current Capabilities and Their Promise for the Future
NASA Astrophysics Data System (ADS)
Collins, J.; Fetterer, F.; Moore, J. A.
2008-12-01
CADIS (the Cooperative Arctic Data and Information Service) serves as the data management, discovery and delivery component of the Arctic Observing Network (AON). As an International Polar Year (IPY) initiative, AON comprises 34 land, atmosphere and ocean observation sites, and will acquire much of the data coming from the interagency Study of Environmental Arctic Change (SEARCH). CADIS is tasked with ensuring that these observational data are managed for long term use by members of the entire Earth System Science community. Portions of CADIS are either in use by the community or available for testing. We now have an opportunity to evaluate the feedback received from our users, to identify any design shortcomings, and to identify those elements which serve their purpose well and will support future development. This presentation will focus on the nuts-and-bolts of the CADIS development to date, with an eye towards presenting lessons learned and best practices based on our experiences so far. The topics include: - How did we assess our users' needs, and how are those contributions reflected in the end product and its capabilities? - Why did we develop a CADIS metadata profile, and how does it allow CADIS to support preservation and scientific interoperability? - How can we shield the user from metadata complexities (especially those associated with various standards) while still obtaining the metadata needed to support an effective data management system? - How can we bridge the gap between the data storage formats considered convenient by researchers in the field, and those which are necessary to provide data interoperability? - What challenges have been encountered in our efforts to provide access to federated data (data stored outside of the CADIS system)? - What are the data browsing and visualization needs of the AON community, and which tools and technologies are most promising in terms of supporting those needs? A live demonstration of the current capabilities of the CADIS system will be included as time and logistics allow. CADIS is a joint effort of the University Corporation for Atmospheric Research (UCAR), the National Snow and Ice Data Center (NSIDC), and the National Center for Atmospheric Research (NCAR).
Making Information Visible, Accessible, and Understandable: Meta-Data and Registries
2007-07-01
the data created, the length of play time, album name, and the genre. Without resource metadata, portable digital music players would not be so...notion of a catalog card in a library. An example of metadata is the description of a music file specifying the creator, the artist that performed the song...describe struc- ture and formatting which are critical to interoperability and the management of databases. Going back to the portable music player example
ERIC Educational Resources Information Center
Levy, David M.; Huttenlocher, Dan; Moll, Angela; Smith, MacKenzie; Hodge, Gail M.; Chandler, Adam; Foley, Dan; Hafez, Alaaeldin M.; Redalen, Aaron; Miller, Naomi
2000-01-01
Includes six articles focusing on the purpose of digital public libraries; encoding electronic documents through compression techniques; a distributed finding aid server; digital archiving practices in the framework of information life cycle management; converting metadata into MARC format and Dublin Core formats; and evaluating Web sites through…
A metadata template for ocean acidification data
NASA Astrophysics Data System (ADS)
Jiang, L.
2014-12-01
Metadata is structured information that describes, explains, and locates an information resource (e.g., data). It is often coarsely described as data about data, and documents information such as what was measured, by whom, when, where, and how it was sampled, analyzed, with what instruments. Metadata is inherent to ensure the survivability and accessibility of the data into the future. With the rapid expansion of biological response ocean acidification (OA) studies, the lack of a common metadata template to document such type of data has become a significant gap for ocean acidification data management efforts. In this paper, we present a metadata template that can be applied to a broad spectrum of OA studies, including those studying the biological responses of organisms on ocean acidification. The "variable metadata section", which includes the variable name, observation type, whether the variable is a manipulation condition or response variable, and the biological subject on which the variable is studied, forms the core of this metadata template. Additional metadata elements, such as principal investigators, temporal and spatial coverage, platforms for the sampling, data citation are essential components to complete the template. We explain the structure of the template, and define many metadata elements that may be unfamiliar to researchers. For that reason, this paper can serve as a user's manual for the template.
Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.
NASA Astrophysics Data System (ADS)
Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.
2016-12-01
Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality and completeness. The inventory is accessible at http://cinergi.sdsc.edu, and the CINERGI project web page is http://earthcube.org/group/cinergi
NASA Astrophysics Data System (ADS)
Plank, G.; Slater, D.; Torrisi, J.; Presser, R.; Williams, M.; Smith, K. D.
2012-12-01
The Nevada Seismological Laboratory (NSL) manages time-series data and high-throughput IP telemetry for the National Center for Nuclear Security (NCNS) Source Physics Experiment (SPE), underway on the Nevada National Security Site (NNSS). During active-source experiments, SPE's heterogeneous systems record over 350 channels of a variety of data types including seismic, infrasound, acoustic, and electro-magnetic. During the interim periods, broadband and short period instruments record approximately 200 channels of continuous, high-sample-rate seismic data. Frequent changes in sensor and station configurations create a challenging meta-data environment. Meta-data account for complete operational histories, including sensor types, serial numbers, gains, sample rates, orientations, instrument responses, data-logger types etc. To date, these catalogue 217 stations, over 40 different sensor types, and over 1000 unique recording configurations (epochs). Facilities for processing, backup, and distribution of time-series data currently span four Linux servers, 60Tb of disk capacity, and two data centers. Bandwidth, physical security, and redundant power and cooling systems for acquisition, processing, and backup servers are provided by NSL's Reno data center. The Nevada System of Higher Education (NSHE) System Computer Services (SCS) in Las Vegas provides similar facilities for the distribution server. NSL staff handle setup, maintenance, and security of all data management systems. SPE PIs have remote access to meta-data, raw data, and CSS3.0 compilations, via SSL-based transfers such as rsync or secure-copy, as well as shell access for data browsing and limited processing. Meta-data are continuously updated and posted on the Las Vegas distribution server as station histories are better understood and errors are corrected. Raw time series and refined CSS3.0 data compilations with standardized formats are transferred to the Las Vegas data server as available. For better data availability and station monitoring, SPE is beginning to leverage NSL's wide-area digital IP network with nine SPE stations and six Rock Valley area stations that stream continuous recordings in real time to the NSL Reno data center. These stations, in addition to eight regional legacy stations supported by National Security Technologies (NSTec), are integrated with NSL's regional monitoring network and constrain a high-quality local earthquake catalog for NNSS. The telemetered stations provide critical capabilities for SPE, and infrastructure for earthquake response on NNSS as well as southern Nevada and the Las Vegas area.
Documenting Climate Models and Their Simulations
Guilyardi, Eric; Balaji, V.; Lawrence, Bryan; ...
2013-05-01
The results of climate models are of increasing and widespread importance. No longer is climate model output of sole interest to climate scientists and researchers in the climate change impacts and adaptation fields. Now nonspecialists such as government officials, policy makers, and the general public all have an increasing need to access climate model output and understand its implications. For this host of users, accurate and complete metadata (i.e., information about how and why the data were produced) is required to document the climate modeling results. We describe a pilot community initiative to collect and make available documentation of climatemore » models and their simulations. In an initial application, a metadata repository is being established to provide information of this kind for a major internationally coordinated modeling activity known as CMIP5 (Coupled Model Intercomparison Project, Phase 5). We expected that for a wide range of stakeholders, this and similar community-managed metadata repositories will spur development of analysis tools that facilitate discovery and exploitation of Earth system simulations.« less
Huang, Min; Liu, Zhaoqing; Qiao, Liyan
2014-10-10
While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme.
Huang, Min; Liu, Zhaoqing; Qiao, Liyan
2014-01-01
While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme. PMID:25310473
NASA Astrophysics Data System (ADS)
Riddick, Andrew; Hughes, Andrew; Harpham, Quillon; Royse, Katherine; Singh, Anubha
2014-05-01
There has been an increasing interest both from academic and commercial organisations over recent years in developing hydrologic and other environmental models in response to some of the major challenges facing the environment, for example environmental change and its effects and ensuring water resource security. This has resulted in a significant investment in modelling by many organisations both in terms of financial resources and intellectual capital. To capitalise on the effort on producing models, then it is necessary for the models to be both discoverable and appropriately described. If this is not undertaken then the effort in producing the models will be wasted. However, whilst there are some recognised metadata standards relating to datasets these may not completely address the needs of modellers regarding input data for example. Also there appears to be a lack of metadata schemes configured to encourage the discovery and re-use of the models themselves. The lack of an established standard for model metadata is considered to be a factor inhibiting the more widespread use of environmental models particularly the use of linked model compositions which fuse together hydrologic models with models from other environmental disciplines. This poster presents the results of a Natural Environment Research Council (NERC) funded scoping study to understand the requirements of modellers and other end users for metadata about data and models. A user consultation exercise using an on-line questionnaire has been undertaken to capture the views of a wide spectrum of stakeholders on how they are currently managing metadata for modelling. This has provided a strong confirmation of our original supposition that there is a lack of systems and facilities to capture metadata about models. A number of specific gaps in current provision for data and model metadata were also identified, including a need for a standard means to record detailed information about the modelling environment and the model code used, to assist the selection of models for linked compositions. Existing best practice, including the use of current metadata standards (e.g. ISO 19110, ISO 19115 and ISO 19119) and the metadata components of WaterML were also evaluated. In addition to commonly used metadata attributes (e.g. spatial reference information) there was significant interest in recording a variety of additional metadata attributes. These included more detailed information about temporal data, and also providing estimates of data accuracy and uncertainty within metadata. This poster describes the key results of this study, including a number of gaps in the provision of metadata for modelling, and outlines how these might be addressed. Overall the scoping study has highlighted significant interest in addressing this issue within the environmental modelling community. There is therefore an impetus for on-going research, and we are seeking to take this forward through collaboration with other interested organisations. Progress towards an internationally recognised model metadata standard is suggested.
GeoDataspaces: Simplifying Data Management Tasks with Globus
NASA Astrophysics Data System (ADS)
Malik, T.; Chard, K.; Tchoua, R. B.; Foster, I.
2014-12-01
Data and its management are central to modern scientific enterprise. Typically, geoscientists rely on observations and model output data from several disparate sources (file systems, RDBMS, spreadsheets, remote data sources). Integrated data management solutions that provide intuitive semantics and uniform interfaces, irrespective of the kind of data source are, however, lacking. Consequently, geoscientists are left to conduct low-level and time-consuming data management tasks, individually, and repeatedly for discovering each data source, often resulting in errors in handling. In this talk we will describe how the EarthCube GeoDataspace project is improving this situation for seismologists, hydrologists, and space scientists by simplifying some of the existing data management tasks that arise when developing computational models. We will demonstrate a GeoDataspace, bootstrapped with "geounits", which are self-contained metadata packages that provide complete description of all data elements associated with a model run, including input/output and parameter files, model executable and any associated libraries. Geounits link raw and derived data as well as associating provenance information describing how data was derived. We will discuss challenges in establishing geounits and describe machine learning and human annotation approaches that can be used for extracting and associating ad hoc and unstructured scientific metadata hidden in binary formats with data resources and models. We will show how geounits can improve search and discoverability of data associated with model runs. To support this model, we will describe efforts related towards creating a scalable metadata catalog that helps to maintain, search and discover geounits within the Globus network of accessible endpoints. This talk will focus on the issue of creating comprehensive personal inventories of data assets for computational geoscientists, and describe a publishing mechanism, which can be used to feed into national, international, or thematic discovery portals.
Development of an open metadata schema for prospective clinical research (openPCR) in China.
Xu, W; Guan, Z; Sun, J; Wang, Z; Geng, Y
2014-01-01
In China, deployment of electronic data capture (EDC) and clinical data management system (CDMS) for clinical research (CR) is in its very early stage, and about 90% of clinical studies collected and submitted clinical data manually. This work aims to build an open metadata schema for Prospective Clinical Research (openPCR) in China based on openEHR archetypes, in order to help Chinese researchers easily create specific data entry templates for registration, study design and clinical data collection. Singapore Framework for Dublin Core Application Profiles (DCAP) is used to develop openPCR and four steps such as defining the core functional requirements and deducing the core metadata items, developing archetype models, defining metadata terms and creating archetype records, and finally developing implementation syntax are followed. The core functional requirements are divided into three categories: requirements for research registration, requirements for trial design, and requirements for case report form (CRF). 74 metadata items are identified and their Chinese authority names are created. The minimum metadata set of openPCR includes 3 documents, 6 sections, 26 top level data groups, 32 lower data groups and 74 data elements. The top level container in openPCR is composed of public document, internal document and clinical document archetypes. A hierarchical structure of openPCR is established according to Data Structure of Electronic Health Record Architecture and Data Standard of China (Chinese EHR Standard). Metadata attributes are grouped into six parts: identification, definition, representation, relation, usage guides, and administration. OpenPCR is an open metadata schema based on research registration standards, standards of the Clinical Data Interchange Standards Consortium (CDISC) and Chinese healthcare related standards, and is to be publicly available throughout China. It considers future integration of EHR and CR by adopting data structure and data terms in Chinese EHR Standard. Archetypes in openPCR are modularity models and can be separated, recombined, and reused. The authors recommend that the method to develop openPCR can be referenced by other countries when designing metadata schema of clinical research. In the next steps, openPCR should be used in a number of CR projects to test its applicability and to continuously improve its coverage. Besides, metadata schema for research protocol can be developed to structurize and standardize protocol, and syntactical interoperability of openPCR with other related standards can be considered.
Integration of external metadata into the Earth System Grid Federation (ESGF)
NASA Astrophysics Data System (ADS)
Berger, Katharina; Levavasseur, Guillaume; Stockhause, Martina; Lautenschlager, Michael
2015-04-01
International projects with high volume data usually disseminate their data in a federated data infrastructure, e.g.~the Earth System Grid Federation (ESGF). The ESGF aims to make the geographically distributed data seamlessly discoverable and accessible. Additional data-related information is currently collected and stored in separate repositories by each data provider. This scattered and useful information is not or only partly available for ESGF users. Examples for such additional information systems are ES-DOC/metafor for model and simulation information, IPSL's versioning information, CHARMe for user annotations, DKRZ's quality information and data citation information. The ESGF Quality Control working team (esgf-qcwt) aims to integrate these valuable pieces of additional information into the ESGF in order to make them available to users and data archive managers by (i) integrating external information into ESGF portal, (ii) integrating links to external information objects into the ESGF metadata index, e.g. by the use of PIDs (Persistent IDentifiers), and (iii) automating the collection of external information during the ESGF data publication process. For the sixth phase of CMIP (Coupled Model Intercomparison Project), the ESGF metadata index is to be enriched by additional information on data citation, file version, etc. This information will support users directly and can be automatically exploited by higher level services (human and machine readability).
NASA Astrophysics Data System (ADS)
Servilla, M.; Brunt, J.
2011-12-01
Emerging in the 1980's as a U.S. National Science Foundation funded research network, the Long Term Ecological Research (LTER) Network began with six sites and with the goal of performing comparative data collection and analysis of major biotic regions of North America. Today, the LTER Network includes 26 sites located in North America, Antarctica, Puerto Rico, and French Polynesia and has contributed a corpus of over 7,000 data sets to the public domain. The diversity of LTER research has led to a wealth of scientific data derived from atmospheric to terrestrial to oceanographic to anthropogenic studies. Such diversity, however, is a contributing factor to data being published with poor or inconsistent quality or to data lacking descriptive documentation sufficient for understanding their origin or performing derivative studies. It is for these reasons that the LTER community, in collaboration with the LTER Network Office, have embarked on the development of the LTER Network Information System (NIS) - an integrative data management approach to improve the process by which quality LTER data and metadata are assembled into a central archive, thereby enabling better discovery, analysis, and synthesis of derived data products. The mission of the LTER NIS is to promote advances in collaborative and synthetic ecological science at multiple temporal and spatial scales by providing the information management and technology infrastructure to increase: ? availability and quality of data from LTER sites - by the use and support of standardized approaches to metadata management and access to data; ? timeliness and number of LTER derived data products - by creating a suite of middleware programs and workflows that make it easy to create and maintain integrated data sets derived from LTER data; and ? knowledge generated from the synthesis of LTER data - by creating standardized access and easy to use applications to discover, access, and use LTER data. The LTER NIS will utilize the Provenance Aware Synthesis Tracking Architecture (PASTA), which will provide the LTER community a metadata-driven data-flow framework to automatically harvest data from LTER research sites and make it available through a well defined software interface. We distinguish PASTA from the more generalized NIS by classifying framework components as critical and enabling cyberinfrastructure that, collectively, provide the services defined by the above mission. Data and metadata will have to pass a set of community defined quality criteria before entry into PASTA, including the use of semantic informing metadata elements and the conformance of data to their structural descriptions provided by metadata. As a result, consumers of data products from PASTA will be assured that metadata are complete and include provenance information where applicable and the data are of the highest quality. Development of the NIS is being performed through community participation. Advisory groups, called "Tiger Teams", are enlisted from the general LTER membership to provide input to the design of the NIS. Other LTER working groups contribute community-based software into the NIS; these include modules for controlled vocabularies, scientific units, and personnel. We anticipate a 2014 release of the LTER NIS.
Use of NLM medical subject headings with the MeSH2010 thesaurus in the PORTAL-DOORS system.
Taswell, Carl
2010-01-01
The NLM MeSH Thesaurus has been incorporated for use in the PORTAL-DOORS System (PDS) for resource metadata management on the semantic web. All 25588 descriptor records from the NLM 2010 MeSH Thesaurus have been exposed as web accessible resources by the PDS MeSH2010 Thesaurus implemented as a PDS PORTAL Registry operating as a RESTful web service. Examples of records from the PDS MeSH2010 PORTAL are demonstrated along with their use by records in other PDS PORTAL Registries that reference the concepts from the MeSH2010 Thesaurus. Use of this important biomedical terminology will greatly enhance the quality of metadata content of other PDS records thus improving cross-domain searches between different problem oriented domains and amongst different clinical specialty fields.
Aircraft scanner data availability via the version 0 Information Management System
NASA Technical Reports Server (NTRS)
Mah, G. R.
1995-01-01
As part of the Earth Observing System Data and Information System (EOSDIS) development, NASA and other government agencies have developed an operational prototype of the Information Management System (IMS). The IMS provides access to the data archived at the Distributed Active Archive Centers (DAAC's) that allows users to search through metadata describing the (image) data. Criteria based on sensor name or type, date and time, and geographic location are used to search the archive. Graphical representations of coverage and browse images are available to further refine a user's selection. previously, the EROS Data Center (EDC) DAAC had identified the Advanced SOlid-state Array Spectrometer (ASAS), Airborne Visible and infrared Imaging Spectrometer (AVIRIS), NS-001, and Thermal Infrared Multispectral Scanner (TIMS) as precursor data sets similar to those the DAAC will handle in the Earth Observing System era. Currently, the EDC DAAC staff, in cooperation with NASA, has transcribed TIMS, NS-001, and Thematic Mapper Simulation (TMS) data from Ames Research Center and also TIMS data from Stennis Space Center. During the transcription process, the IMS metadata and browse images were created to populate the inventory at the EDC DAAC. These data sets are now available in the IMS and may be requested from the any of the DAAC's via the IMS.
NASA Astrophysics Data System (ADS)
Fazliev, A.
2009-04-01
The information and knowledge layers of information-computational system for water spectroscopy are described. Semantic metadata for all the tasks of domain information model that are the basis of the layers have been studied. The principle of semantic metadata determination and mechanisms of the usage during information systematization in molecular spectroscopy has been revealed. The software developed for the work with semantic metadata is described as well. Formation of domain model in the framework of Semantic Web is based on the use of explicit specification of its conceptualization or, in other words, its ontologies. Formation of conceptualization for molecular spectroscopy was described in Refs. 1, 2. In these works two chains of task are selected for zeroth approximation for knowledge domain description. These are direct tasks chain and inverse tasks chain. Solution schemes of these tasks defined approximation of data layer for knowledge domain conceptualization. Spectroscopy tasks solutions properties lead to a step-by-step extension of molecular spectroscopy conceptualization. Information layer of information system corresponds to this extension. An advantage of molecular spectroscopy model designed in a form of tasks chain is actualized in the fact that one can explicitly define data and metadata at each step of solution of these molecular spectroscopy chain tasks. Metadata structure (tasks solutions properties) in knowledge domain also has form of a chain in which input data and metadata of the previous task become metadata of the following tasks. The term metadata is used in its narrow sense: metadata are the properties of spectroscopy tasks solutions. Semantic metadata represented with the help of OWL 3 are formed automatically and they are individuals of classes (A-box). Unification of T-box and A-box is an ontology that can be processed with the help of inference engine. In this work we analyzed the formation of individuals of molecular spectroscopy applied ontologies as well as the software used for their creation by means of OWL DL language. The results of this work are presented in a form of an information layer and a knowledge layer in W@DIS information system 4. 1 FORMATION OF INDIVIDUALS OF WATER SPECTROSCOPY APPLIED ONTOLOGY Applied tasks ontology contains explicit description of input an output data of physical tasks solved in two chains of molecular spectroscopy tasks. Besides physical concepts, related to spectroscopy tasks solutions, an information source, which is a key concept of knowledge domain information model, is also used. Each solution of knowledge domain task is linked to the information source which contains a reference on published task solution, molecule and task solution properties. Each information source allows us to identify a certain knowledge domain task solution contained in the information system. Water spectroscopy applied ontology classes are formed on the basis of molecular spectroscopy concepts taxonomy. They are defined by constrains on properties of the selected conceptualization. Extension of applied ontology in W@DIS information system is actualized according to two scenarios. Individuals (ontology facts or axioms) formation is actualized during the task solution upload in the information system. Ontology user operation that implies molecular spectroscopy taxonomy and individuals is performed solely by the user. For this purpose Protege ontology editor was used. For the formation, processing and visualization of knowledge domain tasks individuals a software was designed and implemented. Method of individual formation determines the sequence of steps of created ontology individuals' generation. Tasks solutions properties (metadata) have qualitative and quantitative values. Qualitative metadata are regarded as metadata describing qualitative side of a task such as solution method or other information that can be explicitly specified by object properties of OWL DL language. Quantitative metadata are metadata that describe quantitative properties of task solution such as minimal and maximal data value or other information that can be explicitly obtained by programmed algorithmic operations. These metadata are related to DatatypeProperty properties of OWL specification language Quantitative metadata can be obtained automatically during data upload into information system. Since ObjectProperty values are objects, processing of qualitative metadata requires logical constraints. In case of the task solved in W@DIS ICS qualitative metadata can be formed automatically (for example in spectral functions calculation task). The used methods of translation of qualitative metadata into quantitative is characterized as roughened representation of knowledge in knowledge domain. The existence of two ways of data obtainment is a key moment in the formation of applied ontology of molecular spectroscopy task. experimental method (metadata for experimental data contain description of equipment, experiment conditions and so on) on the initial stage and inverse task solution on the following stages; calculation method (metadata for calculation data are closely related to the metadata used for the description of physical and mathematical models of molecular spectroscopy) 2 SOFTWARE FOR ONTOLOGY OPERATION Data collection in water spectroscopy information system is organized in a form of workflow that contains such operations as information source creation, entry of bibliographic data on publications, formation of uploaded data schema an so on. Metadata are generated in information source as well. Two methods are used for their formation: automatic metadata generation and manual metadata generation (performed by user). Software implementation of support of actions related to metadata formation is performed by META+ module. Functions of META+ module can be divided into two groups. The first groups contains the functions necessary to software developer while the second one the functions necessary to a user of the information system. META+ module functions necessary to the developer are: 1. creation of taxonomy (T-boxes) of applied ontology classes of knowledge domain tasks; 2. creation of instances of task classes; 3. creation of data schemes of tasks in a form of an XML-pattern and based on XML-syntax. XML-pattern is developed for instances generator and created according to certain rules imposed on software generator implementation. 4. implementation of metadata values calculation algorithms; 5. creation of a request interface and additional knowledge processing function for the solution of these task; 6. unification of the created functions and interfaces into one information system The following sequence is universal for the generation of task classes' individuals that form chains. Special interfaces for user operations management are designed for software developer in META+ module. There are means for qualitative metadata values updating during data reuploading to information source. The list of functions necessary to end user contains: - data sets visualization and editing, taking into account their metadata, e.g.: display of unique number of bands in transitions for a certain data source; - export of OWL/RDF models from information system to the environment in XML-syntax; - visualization of instances of classes of applied ontology tasks on molecular spectroscopy; - import of OWL/RDF models into the information system and their integration with domain vocabulary; - formation of additional knowledge of knowledge domain for the construction of ontological instances of task classes using GTML-formats and their processing; - formation of additional knowledge in knowledge domain for the construction of instances of task classes, using software algorithm for data sets processing; - function of semantic search implementation using an interface that formulates questions in a form of related triplets in order for getting an adequate answer. 3 STRUCTURE OF META+ MODULE META+ software module that provides the above functions contains the following components: - a knowledge base that stores semantic metadata and taxonomies of information system; - software libraries POWL and RAP 5 created by third-party developer and providing access to ontological storage; - function classes and libraries that form the core of the module and perform the tasks of formation, storage and visualization of classes instances; - configuration files and module patterns that allow one to adjust and organize operation of different functional blocks; META+ module also contains scripts and patterns implemented according to the rules of W@DIS information system development environment. - scripts for interaction with environment by means of the software core of information system. These scripts provide organizing web-oriented interactive communication; - patterns for the formation of functionality visualization realized by the scripts Software core of scientific information-computational system W@DIS is created with the help of MVC (Model - View - Controller) design pattern that allows us to separate logic of application from its representation. It realizes the interaction of three logical components, actualizing interactivity with the environment via Web and performing its preprocessing. Functions of «Controller» logical component are realized with the help of scripts designed according to the rules imposed by software core of the information system. Each script represents a definite object-oriented class with obligatory class method of script initiation called "start". Functions of actualization of domain application operation results representation (i.e. "View" component) are sets of HTML-patterns that allow one to visualize the results of domain applications operation with the help of additional constructions processed by software core of the system. Besides the interaction with the software core of the scientific information system this module also deals with configuration files of software core and its database. Such organization of work provides closer integration with software core and deeper and more adequate connection in operating system support. 4 CONCLUSION In this work the problems of semantic metadata creation in information system oriented on information representation in the area of molecular spectroscopy have been discussed. The described method of semantic metadata and functions formation as well as realization and structure of META+ module have been described. Architecture of META+ module is closely related to the existing software of "Molecular spectroscopy" scientific information system. Realization of the module is performed with the use of modern approaches to Web-oriented applications development. It uses the existing applied interfaces. The developed software allows us to: - perform automatic metadata annotation of calculated tasks solutions directly in the information system; - perform automatic annotation of metadata on the solution of tasks on task solution results uploading outside the information system forming an instance of the solved task on the basis of entry data; - use ontological instances of task solution for identification of data in information tasks of viewing, comparison and search solved by information system; - export applied tasks ontologies for the operation with them by external means; - solve the task of semantic search according to the pattern and using question-answer type interface. 5 ACKNOWLEDGEMENT The authors are grateful to RFBR for the financial support of development of distributed information system for molecular spectroscopy. REFERENCES A.D.Bykov, A.Z. Fazliev, N.N.Filippov, A.V. Kozodoev, A.I.Privezentsev, L.N.Sinitsa, M.V.Tonkov and M.Yu.Tretyakov, Distributed information system on atmospheric spectroscopy // Geophysical Research Abstracts, SRef-ID: 1607-7962/gra/EGU2007-A-01906, 2007, v. 9, p. 01906. A.I.Prevezentsev, A.Z. Fazliev Applied task ontology for molecular spectroscopy information resources systematization. The Proceedings of 9th Russian scientific conference "Electronic libraries: advanced methods and technologies, electronic collections" - RCDL'2007, Pereslavl Zalesskii, 2007, part.1, 2007, P.201-210. OWL Web Ontology Language Semantics and Abstract Syntax, W3C Recommendation 10 February 2004, http://www.w3.org/TR/2004/REC-owl-semantics-20040210/ W@DIS information system, http://wadis.saga.iao.ru RAP library, http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/.
A Common Metadata System for Marine Data Portals
NASA Astrophysics Data System (ADS)
Wosniok, C.; Breitbach, G.; Lehfeldt, R.
2012-04-01
Processing and allocation of marine datasets depend on the nature of the data resulting from field campaigns, continuous monitoring and numerical modeling. Two research and development projects in northern Germany manage different types of marine data. Due to different data characteristics and institutional frameworks separate data portals are required. This paper describes the integration of distributed marine data in Germany. The Marine Data Infrastructure of Germany (MDI-DE) supports public authorities in the German coastal zone with the implementation of European directives like INSPIRE or the Marine Strategy Framework Directive. This is carried out through setting up standardized web services within a network of participating coastal agencies and the installation of a common data portal (http://www.mdi-de.org), which integrates distributed marine data concerning coastal engineering, coastal water protection and nature conservation in an interoperable and harmonized manner for administrative and scientific purposes as well as for information of the general public. The Coastal Observation System for Northern and Arctic Seas (COSYNA) aims at developing and testing analysis systems for the operational synoptic description of the environmental status of the North Sea and of Arctic coastal waters. This is done by establishing a network of monitoring facilities and the provision of its data in near-real-time. In situ measurements with poles, ferry boxes, and buoys, together with remote sensing measurements, and the data assimilation of these data into simulation results enables COSYNA to provide pre-operational 'products', that are beyond the present routinely applied techniques in observation and modelling. The data allocation in near-real-time requires thoroughly executed data validation, which is processed on the fly before data is passed on to the COSYNA portal (http://kofserver2.hzg.de/codm/). Both projects apply OGC standards such as Web Mapping Service (WMS), Web Feature Service (WFS) and Sensor Observation Service (SOS), which ensures interoperability and extensibility. In addition, metadata as crucial components for searching and finding information in large data infrastructures is provided via the Catalogue Web Service (CS-W). MDI-DE and COSYNA rely on the metadata information system for marine metadata NOKIS, which reflects a metadata profile tailored for marine data according to the specifications of German coastal authorities. In spite of this common software base, interoperability between the two data collections requires constant alignments of the diverse data processed by the two portals. While monitoring data in the MDI-DE is currently rather campaign-based, COSYNA has to fit constantly evolving time series into metadata sets. With all data following the same metadata profile, we now reach full interoperability between the different data collections. The distributed marine information system provides options to search, find and visualise the harmonised results from continuous monitoring, field campaigns, numerical modeling and other data in one web client.
Vempati, Uma D; Chung, Caty; Mader, Chris; Koleti, Amar; Datar, Nakul; Vidović, Dušica; Wrobel, David; Erickson, Sean; Muhlich, Jeremy L; Berriz, Gabriel; Benes, Cyril H; Subramanian, Aravind; Pillai, Ajay; Shamu, Caroline E; Schürer, Stephan C
2014-06-01
The National Institutes of Health Library of Integrated Network-based Cellular Signatures (LINCS) program is generating extensive multidimensional data sets, including biochemical, genome-wide transcriptional, and phenotypic cellular response signatures to a variety of small-molecule and genetic perturbations with the goal of creating a sustainable, widely applicable, and readily accessible systems biology knowledge resource. Integration and analysis of diverse LINCS data sets depend on the availability of sufficient metadata to describe the assays and screening results and on their syntactic, structural, and semantic consistency. Here we report metadata specifications for the most important molecular and cellular components and recommend them for adoption beyond the LINCS project. We focus on the minimum required information to model LINCS assays and results based on a number of use cases, and we recommend controlled terminologies and ontologies to annotate assays with syntactic consistency and semantic integrity. We also report specifications for a simple annotation format (SAF) to describe assays and screening results based on our metadata specifications with explicit controlled vocabularies. SAF specifically serves to programmatically access and exchange LINCS data as a prerequisite for a distributed information management infrastructure. We applied the metadata specifications to annotate large numbers of LINCS cell lines, proteins, and small molecules. The resources generated and presented here are freely available. © 2014 Society for Laboratory Automation and Screening.
Metadata-Driven SOA-Based Application for Facilitation of Real-Time Data Warehousing
NASA Astrophysics Data System (ADS)
Pintar, Damir; Vranić, Mihaela; Skočir, Zoran
Service-oriented architecture (SOA) has already been widely recognized as an effective paradigm for achieving integration of diverse information systems. SOA-based applications can cross boundaries of platforms, operation systems and proprietary data standards, commonly through the usage of Web Services technology. On the other side, metadata is also commonly referred to as a potential integration tool given the fact that standardized metadata objects can provide useful information about specifics of unknown information systems with which one has interest in communicating with, using an approach commonly called "model-based integration". This paper presents the result of research regarding possible synergy between those two integration facilitators. This is accomplished with a vertical example of a metadata-driven SOA-based business process that provides ETL (Extraction, Transformation and Loading) and metadata services to a data warehousing system in need of a real-time ETL support.
The CMS Data Management System
NASA Astrophysics Data System (ADS)
Giffels, M.; Guo, Y.; Kuznetsov, V.; Magini, N.; Wildish, T.
2014-06-01
The data management elements in CMS are scalable, modular, and designed to work together. The main components are PhEDEx, the data transfer and location system; the Data Booking Service (DBS), a metadata catalog; and the Data Aggregation Service (DAS), designed to aggregate views and provide them to users and services. Tens of thousands of samples have been cataloged and petabytes of data have been moved since the run began. The modular system has allowed the optimal use of appropriate underlying technologies. In this contribution we will discuss the use of both Oracle and NoSQL databases to implement the data management elements as well as the individual architectures chosen. We will discuss how the data management system functioned during the first run, and what improvements are planned in preparation for 2015.
GRIIDC: A Data Repository for Gulf of Mexico Science
NASA Astrophysics Data System (ADS)
Ellis, S.; Gibeaut, J. C.
2017-12-01
The Gulf of Mexico Research Initiative Information & Data Cooperative (GRIIDC) system is a data management solution appropriate for any researcher sharing Gulf of Mexico and oil spill science data. Our mission is to ensure a data and information legacy that promotes continual scientific discovery and public awareness of the Gulf of Mexico ecosystem. GRIIDC developed an open-source software solution to manage data from the Gulf of Mexico Research Initiative (GoMRI). The GoMRI program has over 2500 researchers from diverse fields of study with a variety of attitudes, experiences, and capacities for data sharing. The success of this solution is apparent through new partnerships to share data generated by RESTORE Act Centers of Excellence Programs, the National Academies of Science, and others. The GRIIDC data management system integrates dataset management planning, metadata creation, persistent identification, and data discoverability into an easy-to-use web application. No specialized software or program installations are required to support dataset submission or discovery. Furthermore, no data transformations are needed to submit data to GRIIDC; common file formats such as Excel, csv, and text are all acceptable for submissions. To ensure data are properly documented using the GRIIDC implementation of the ISO 19115-2 metadata standard, researchers submit detailed descriptive information through a series of interactive forms and no knowledge of metadata or xml formats are required. Once a dataset is documented and submitted the GRIIDC team performs a review of the dataset package. This review ensures that files can be opened and contain data, and that data are completely and accurately described. This review does not include performing quality assurance or control of data points, as GRIIDC expects scientists to perform these steps during the course of their work. Once approved, data are made public and searchable through the GRIIDC data discovery portal and the DataONE network.
GeneLab Analysis Working Group Kick-Off Meeting
NASA Technical Reports Server (NTRS)
Costes, Sylvain V.
2018-01-01
Goals to achieve for GeneLab AWG - GL vision - Review of GeneLab AWG charter Timeline and milestones for 2018 Logistics - Monthly Meeting - Workshop - Internship - ASGSR Introduction of team leads and goals of each group Introduction of all members Q/A Three-tier Client Strategy to Democratize Data Physiological changes, pathway enrichment, differential expression, normalization, processing metadata, reproducibility, Data federation/integration with heterogeneous bioinformatics external databases The GLDS currently serves over 100 omics investigations to the biomedical community via open access. In order to expand the scope of metadata record searches via the GLDS, we designed a metadata warehouse that collects and updates metadata records from external systems housing similar data. To demonstrate the capabilities of federated search and retrieval of these data, we imported metadata records from three open-access data systems into the GLDS metadata warehouse: NCBI's Gene Expression Omnibus (GEO), EBI's PRoteomics IDEntifications (PRIDE) repository, and the Metagenomics Analysis server (MG-RAST). Each of these systems defines metadata for omics data sets differently. One solution to bridge such differences is to employ a common object model (COM) to which each systems' representation of metadata can be mapped. Warehoused metadata records are then transformed at ETL to this single, common representation. Queries generated via the GLDS are then executed against the warehouse, and matching records are shown in the COM representation (Fig. 1). While this approach is relatively straightforward to implement, the volume of the data in the omics domain presents challenges in dealing with latency and currency of records. Furthermore, the lack of a coordinated has been federated data search for and retrieval of these kinds of data across other open-access systems, so that users are able to conduct biological meta-investigations using data from a variety of sources. Such meta-investigations are key to corroborating findings from many kinds of assays and translating them into systems biology knowledge and, eventually, therapeutics.
NASA Astrophysics Data System (ADS)
Schaap, Dick M. A.; Maudire, Gilbert
2010-05-01
SeaDataNet is a leading infrastructure in Europe for marine & ocean data management. It is actively operating and further developing a Pan-European infrastructure for managing, indexing and providing access to ocean and marine data sets and data products, acquired via research cruises and other observational activities, in situ and remote sensing. The basis of SeaDataNet is interconnecting 40 National Oceanographic Data Centres and Marine Data Centers from 35 countries around European seas into a distributed network of data resources with common standards for metadata, vocabularies, data transport formats, quality control methods and flags, and access. Thereby most of the NODC's operate and/or are developing national networks to other institutes in their countries to ensure national coverage and long-term stewardship of available data sets. The majority of data managed by SeaDataNet partners concerns physical oceanography, marine chemistry, hydrography, and a substantial volume of marine biology and geology and geophysics. These are partly owned by the partner institutes themselves and for a major part also owned by other organizations from their countries. The SeaDataNet infrastructure is implemented with support of the EU via the EU FP6 SeaDataNet project to provide the Pan-European data management system adapted both to the fragmented observation system and the users need for an integrated access to data, meta-data, products and services. The SeaDataNet project has a duration of 5 years and started in 2006, but builds upon earlier data management infrastructure projects, undertaken over a period of 20 years by an expanding network of oceanographic data centres from the countries around all European seas. Its predecessor project Sea-Search had a strict focus on metadata. SeaDataNet maintains significant interest in the further development of the metadata infrastructure, extending its services with the provision of easy data access and generic data products. Version 1 of its infrastructure upgrade was launched in April 2008 and is now well underway to include all 40 data centres at V1 level. It comprises the network of 40 interconnected data centres (NODCs) and a central SeaDataNet portal. V1 provides users a unified and transparent overview of the metadata and controlled access to the large collections of data sets, that are managed at these data centres. The SeaDataNet V1 infrastructure comprises the following middleware services: • Discovery services = Metadata directories and User interfaces • Vocabulary services = Common vocabularies and Governance • Security services = Authentication, Authorization & Accounting • Delivery services = Requesting and Downloading of data sets • Viewing services = Mapping of metadata • Monitoring services = Statistics on system usage and performance and Registration of data requests and transactions • Maintenance services = Entry and updating of metadata by data centres Also good progress is being made with extending the SeaDataNet infrastructure with V2 services: • Viewing services = Quick views and Visualisation of data and data products • Product services = Generic and standard products • Exchange services = transformation of SeaDataNet portal CDI output to INSPIRE compliance As a basis for the V1 services, common standards have been defined for metadata and data formats, common vocabularies, quality flags, and quality control methods, based on international standards, such as ISO 19115, OGC, NetCDF (CF), ODV, best practices from IOC and ICES, and following INSPIRE developments. An important objective of the SeaDataNet V1 infrastructure is to provide transparent access to the distributed data sets via a unique user interface and download service. In the SeaDataNet V1 architecture the Common Data Index (CDI) V1 metadata service provides the link between discovery and delivery of data sets. The CDI user interface enables users to have a detailed insight of the availability and geographical distribution of marine data, archived at the connected data centres. It provides sufficient information to allow the user to assess the data relevance. Moreover the CDI user interface provides the means for downloading data sets in common formats via a transaction mechanism. The SeaDataNet portal provides registered users access to these distributed data sets via the CDI V1 Directory and a shopping basket mechanism. This allows registered users to locate data of interest and submit their data requests. The requests are forwarded automatically from the portal to the relevant SeaDataNet data centres. This process is controlled via the Request Status Manager (RSM) Web Service at the portal and a Download Manager (DM) java software module, implemented at each of the data centres. The RSM also enables registered users to check regularly the status of their requests and download data sets, after access has been granted. Data centres can follow all transactions for their data sets online and can handle requests which require their consent. The actual delivery of data sets is done between the user and the selected data centre. Very good progress is being made with connecting all SeaDataNet data centres and their data sets to the CDI V1 system. At present the CDI V1 system provides users functionality to discover and download more than 500.000 data sets, a number which is steadily increasing. The SeaDataNet architecture provides a coherent system of the various V1 services and inclusion of the V2 services. For the implementation, a range of technical components have been defined and developed. These make use of recent web technologies, and also comprise Java components, to provide multi-platform support and syntactic interoperability. To facilitate sharing of resources and interoperability, SeaDataNet has adopted the technology of SOAP Web services for various communication tasks. The SeaDataNet architecture has been designed as a multi-disciplinary system from the beginning. It is able to support a wide variety of data types and to serve several sector communities. SeaDataNet is willing to share its technologies and expertise, to spread and expand its approach, and to build bridges to other well established infrastructures in the marine domain. Therefore SeaDataNet has developed a strategy of seeking active cooperation on a national scale with other data holding organisations via its NODC networks and on an international scale with other European and international data management initiatives and networks. This is done with the objective to achieve a wider coverage of data sources and an overall interoperability between data infrastructures in the marine and ocean domains. Recent examples are e.g. the EU FP7 projects Geo-Seas for geology and geophysical data sets, UpgradeBlackSeaScene for a Black Sea data management infrastructure, CaspInfo for a Caspian Sea data management infrastructure, the EU EMODNET pilot projects, for hydrographic, chemical, and biological data sets. All projects are adopting the SeaDataNet standards and extending its services. Also active cooperation takes place with EuroGOOS and MyOcean in the domain of real-time and delayed mode metocean monitoring data. SeaDataNet Partners: IFREMER (France), MARIS (Netherlands), HCMR/HNODC (Greece), ULg (Belgium), OGS (Italy), NERC/BODC (UK), BSH/DOD (Germany), SMHI (Sweden), IEO (Spain), RIHMI/WDC (Russia), IOC (International), ENEA (Italy), INGV (Italy), METU (Turkey), CLS (France), AWI (Germany), IMR (Norway), NERI (Denmark), ICES (International), EC-DG JRC (International), MI (Ireland), IHPT (Portugal), RIKZ (Netherlands), RBINS/MUMM (Belgium), VLIZ (Belgium), MRI (Iceland), FIMR (Finland ), IMGW (Poland), MSI (Estonia), IAE/UL (Latvia), CMR (Lithuania), SIO/RAS (Russia), MHI/DMIST (Ukraine), IO/BAS (Bulgaria), NIMRD (Romania), TSU (Georgia), INRH (Morocco), IOF (Croatia), PUT (Albania), NIB (Slovenia), UoM (Malta), OC/UCY (Cyprus), IOLR (Israel), NCSR/NCMS (Lebanon), CNR-ISAC (Italy), ISMAL (Algeria), INSTM (Tunisia)
Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.
2014-01-01
Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).
Dinov, Ivo D; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H V; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D Stott; Toga, Arthur W
2008-05-28
The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
BCO-DMO: Enabling Access to Federally Funded Research Data
NASA Astrophysics Data System (ADS)
Kinkade, D.; Allison, M. D.; Chandler, C. L.; Groman, R. C.; Rauch, S.; Shepherd, A.; Gegg, S. R.; Wiebe, P. H.; Glover, D. M.
2013-12-01
In a February, 2013 memo1, the White House Office of Science and Technology Policy (OSTP) outlined principles and objectives to increase access by the public to federally funded research publications and data. Such access is intended to drive innovation by allowing private and commercial efforts to take full advantage of existing resources, thereby maximizing Federal research dollars and efforts. The Biological and Chemical Oceanography Data Management Office (BCO-DMO; bco-dmo.org) serves as a model resource for organizations seeking compliance with the OSTP policy. BCO-DMO works closely with scientific investigators to publish their data from research projects funded by the National Science Foundation (NSF), within the Biological and Chemical Oceanography Sections (OCE) and the Division of Polar Programs Antarctic Organisms & Ecosystems Program (PLR). BCO-DMO addresses many of the OSTP objectives for public access to digital scientific data: (1) Marine biogeochemical and ecological data and metadata are disseminated via a public website, and curated on intermediate time frames; (2) Preservation needs are met by collaborating with appropriate national data facilities for data archive; (3) Cost and administrative burden associated with data management is minimized by the use of one dedicated office providing hundreds of NSF investigators support for data management plan development, data organization, metadata generation and deposition of data and metadata into the BCO-DMO repository; (4) Recognition of intellectual property is reinforced through the office's citation policy and the use of digital object identifiers (DOIs); (5) Education and training in data stewardship and use of the BCO-DMO system is provided by office staff through a variety of venues. Oceanographic research data and metadata from thousands of datasets generated by hundreds of investigators are now available through BCO-DMO. 1 White House Office of Science and Technology Policy, Memorandum for the Heads of Executive Departments and Agencies: Increasing Access to the Results of Federally Funded Scientific Research, February 23, 2013. http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
Software for Managing an Archive of Images
NASA Technical Reports Server (NTRS)
Hallai, Charles; Jones, Helene; Callac, Chris
2003-01-01
This is a revised draft by Innovators concerning the report on Software for Managing and Archive of Images.The SSC Multimedia Archive is an automated electronic system to manage images, acquired both by film and digital cameras, for the Public Affairs Office (PAO) at Stennis Space Center (SSC). Previously, the image archive was based on film photography and utilized a manual system that, by todays standards, had become inefficient and expensive. Now, the SSC Multimedia Archive, based on a server at SSC, contains both catalogs and images for pictures taken both digitally and with a traditional film-based camera, along with metadata about each image.
NASA Astrophysics Data System (ADS)
Fuentes, Daniel; Pérez-Luque, Antonio J.; Bonet García, Francisco J.; Moreno-LLorca, Ricardo A.; Sánchez-Cano, Francisco M.; Suárez-Muñoz, María
2017-04-01
The Long Term Ecological Research (LTER) network aims to provide the scientific community, policy makers, and society with the knowledge and predictive understanding necessary to conserve, protect, and manage the ecosystems. LTER is organized into networks ranging from the global to national scale. In the top of network, the International Long Term Ecological Research (ILTER) Network coordinates among ecological researchers and LTER research networks at local, regional and global scales. In Spain, the Spanish Long Term Ecological Research (LTER-Spain) network was built to foster the collaboration and coordination between longest-lived ecological researchers and networks on a local scale. Currently composed by nine nodes, this network facilitates the data exchange, documentation and preservation encouraging the development of cross-disciplinary works. However, most nodes have no specific information systems, tools or qualified personnel to manage their data for continued conservation and there are no harmonized methodologies for long-term monitoring protocols. Hence, the main challenge is to place the nodes in its correct position in the network, providing the best tools that allow them to manage their data autonomously and make it easier for them to access information and knowledge in the network. This work proposes a connected structure composed by four LTER nodes located in southern Spain. The structure is built considering hierarchical approach: nodes that create information which is documented using metadata standards (such as Ecological Metadata Language, EML); and others nodes that gather metadata and information. We also take into account the capacity of each node to manage their own data and the premise that the data and metadata must be maintained where it is generated. The current state of the nodes is a follows: two of them have their own information management system (Sierra Nevada-Granada and Doñana Long-Term Socio-ecological Research Platform) and another has no infrastructure to maintain their data (The Arid Iberian South East LTSER Platform). The last one (Environmental Information Network of Andalusia-REDIAM) acts as the coordinator, providing physical and logical support to other nodes and also gathers and distributes the information "uphill" to the rest of the network (LTER Europe and ILTER). The development of the network has been divided in three stages. First, existing resources and data management requirements are identified in each node. Second, the necessary software tools and interoperable standards to manage and exchange the data have been selected, installed and configured in each participant. Finally, once the network has been set up completely, it is expected to expand it all over Spain with new nodes and its connection to others LTER and similar networks. This research has been funded by ADAPTAMED (Protection of key ecosystem services by adaptive management of Climate Change endangered Mediterranean socioecosystems) Life EU project, Sierra Nevada Global Change Observatory (LTER-site) and eLTER (Integrated European Long Term Ecosystem & Socio-Ecological Research Infrastructure).
Achieving Sub-Second Search in the CMR
NASA Astrophysics Data System (ADS)
Gilman, J.; Baynes, K.; Pilone, D.; Mitchell, A. E.; Murphy, K. J.
2014-12-01
The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA's Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation. The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes: * Architectural features like immutability and backpressure. * Data management techniques such as caching and parallel loading that give big performance gains. * Open Source and COTS tools like Elasticsearch search engine. * Adoption of Clojure, a functional programming language for the Java Virtual Machine. * Development of a custom spatial search plugin for Elasticsearch and why it was necessary. * Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.
Beck, Peter; Truskaller, Thomas; Rakovac, Ivo; Cadonna, Bruno; Pieber, Thomas R
2006-01-01
In this paper we describe the approach to build a web-based clinical data management infrastructure on top of an entity-attribute-value (EAV) database which provides for flexible definition and extension of clinical data sets as well as efficient data handling and high performance query execution. A "mixed" EAV implementation provides a flexible and configurable data repository and at the same time utilizes the performance advantages of conventional database tables for rarely changing data structures. A dynamically configurable data dictionary contains further information for data validation. The online user interface can also be assembled dynamically. A data transfer object which encapsulates data together with all required metadata is populated by the backend and directly used to dynamically render frontend forms and handle incoming data. The "mixed" EAV model enables flexible definition and modification of clinical data sets while reducing performance drawbacks of pure EAV implementations to a minimum. The system currently is in use in an electronic patient record with focus on flexibility and a quality management application (www.healthgate.at) with high performance requirements.
DOIDB: Reusing DataCite's search software as metadata portal for GFZ Data Services
NASA Astrophysics Data System (ADS)
Elger, K.; Ulbricht, D.; Bertelmann, R.
2016-12-01
GFZ Data Services is the central service point for the publication of research data at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences (GFZ). It provides data publishing services to scientists of GFZ, associated projects, and associated institutions. The publishing services aim to make research data and physical samples visible and citable, by assigning persistent identifiers (DOI, IGSN) and by complementing existing IT infrastructure. To integrate several research domains a modular software stack that is made of free software components has been created to manage data and metadata as well as register persistent identifiers [1]. Pivotal component for the registration of DOIs is the DOIDB. It has been derived from three software components provided by DataCite [2] that moderate the registration of DOIs and the deposition of metadata, allow the dissemination of metadata, and provide a user interface to navigate and discover datasets. The DOIDB acts as a proxy to the DataCite infrastructure and in addition to the DataCite metadata schema, it allows to deposit and disseminate metadata following the schemas ISO19139 and NASA GCMD DIF. The search component has been modified to meet the requirements of a geosciences metadata portal. In particular, the search component has been altered to make use of Apache SOLRs capability to index and query spatial coordinates. Furthermore, the user interface has been adjusted to provide a first impression of the data by showing a map, summary information and subjects. DOIDB and its components are available on GitHub [3].We present a software solution for registration of DOIs that allows to integrate existing data systems, keeps track of registered DOIs, and provides a metadata portal to discover datasets [4]. [1] Ulbricht, D.; Elger, K.; Bertelmann, R.; Klump, J. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services. ISPRS Int. J. Geo-Inf. 2016, 5, 25. http://doi.org/10.3390/ijgi5030025[2] https://github.com/datacite[3] https://github.com/ulbricht/search/tree/doidb , https://github.com/ulbricht/mds/tree/doidb , https://github.com/ulbricht/oaip/tree/doidb[4] http://doidb.wdc-terra.org
MMI's Metadata and Vocabulary Solutions: 10 Years and Growing
NASA Astrophysics Data System (ADS)
Graybeal, J.; Gayanilo, F.; Rueda-Velasquez, C. A.
2014-12-01
The Marine Metadata Interoperability project (http://marinemetadata.org) held its public opening at AGU's 2004 Fall Meeting. For 10 years since that debut, the MMI guidance and vocabulary sites have served over 100,000 visitors, with 525 community members and continuous Steering Committee leadership. Originally funded by the National Science Foundation, over the years multiple organizations have supported the MMI mission: "Our goal is to support collaborative research in the marine science domain, by simplifying the incredibly complex world of metadata into specific, straightforward guidance. MMI encourages scientists and data managers at all levels to apply good metadata practices from the start of a project, by providing the best guidance and resources for data management, and developing advanced metadata tools and services needed by the community." Now hosted by the Harte Research Institute at Texas A&M University at Corpus Christi, MMI continues to provide guidance and services to the community, and is planning for marine science and technology needs for the next 10 years. In this presentation we will highlight our major accomplishments, describe our recent achievements and imminent goals, and propose a vision for improving marine data interoperability for the next 10 years, including Ontology Registry and Repository (http://mmisw.org/orr) advancements and applications (http://mmisw.org/cfsn).
Evolving from bioinformatics in-the-small to bioinformatics in-the-large.
Parker, D Stott; Gorlick, Michael M; Lee, Christopher J
2003-01-01
We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.
Increasing the international visibility of research data by a joint metadata schema
NASA Astrophysics Data System (ADS)
Svoboda, Nikolai; Zoarder, Muquit; Gärtner, Philipp; Hoffmann, Carsten; Heinrich, Uwe
2017-04-01
The BonaRes Project ("Soil as a sustainable resource for the bioeconomy") was launched in 2015 to promote sustainable soil management and to avoid fragmentation of efforts (Wollschläger et al., 2016). For this purpose, an IT infrastructure is being developed to upload, manage, store, and provide research data and its associated metadata. The research data provided by the BonaRes data centre are, in principle, not subject to any restrictions on reuse. For all research data considerable standardized metadata are the key enablers for the effective use of these data. Providing proper metadata is often viewed as an extra burden with further work and resources consumed. In our lecture we underline the benefits of structured and interoperable metadata like: accessibility of data, discovery of data, interpretation of data, linking data and several more and we counter these advantages with the effort of time, personnel and further costs. Building on this, we describe the framework of metadata in BonaRes combining the standards of OGC for description, visualization, exchange and discovery of geodata as well as the schema of DataCite for the publication and citation of this research data. This enables the generation of a DOI, a unique identifier that provides a permanent link to the citable research data. By using OGC standards, data and metadata become interoperable with numerous research data provided via INSPIRE. It enables further services like CSW for harvesting WMS for visualization and WFS for downloading. We explain the mandatory fields that result from our approach and we give a general overview about our metadata architecture implementation. Literature: Wollschläger, U; Helming, K.; Heinrich, U.; Bartke, S.; Kögel-Knabner, I.; Russell, D.; Eberhardt, E. & Vogel, H.-J.: The BonaRes Centre - A virtual institute for soil research in the context of a sustainable bio-economy. Geophysical Research Abstracts, Vol. 18, EGU2016-9087, 2016.
Menzel, Julia; Weil, Philipp; Bittihn, Philip; Hornung, Daniel; Mathieu, Nadine; Demiroglu, Sara Y
2013-01-01
Sustainable data management in biomedical research requires documentation of metadata for all experiments and results. Scientists usually document research data and metadata in laboratory paper notebooks. An electronic laboratory notebook (ELN) can keep metadata linked to research data resulting in a better understanding of the research results, meaning a scientific benefit [1]. Besides other challenges [2], the biggest hurdles for introducing an ELN seem to be usability, file formats, and data entry mechanisms [3] and that many ELNs are assigned to specific research fields such as biology, chemistry, or physics [4]. We aimed to identify requirements for the introduction of ELN software in a biomedical collaborative research center [5] consisting of different scientific fields and to find software fulfilling most of these requirements.
NELS 2.0 - A general system for enterprise wide information management
NASA Technical Reports Server (NTRS)
Smith, Stephanie L.
1993-01-01
NELS, the NASA Electronic Library System, is an information management tool for creating distributed repositories of documents, drawings, and code for use and reuse by the aerospace community. The NELS retrieval engine can load metadata and source files of full text objects, perform natural language queries to retrieve ranked objects, and create links to connect user interfaces. For flexibility, the NELS architecture has layered interfaces between the application program and the stored library information. The session manager provides the interface functions for development of NELS applications. The data manager is an interface between session manager and the structured data system. The center of the structured data system is the Wide Area Information Server. This system architecture provides access to information across heterogeneous platforms in a distributed environment. There are presently three user interfaces that connect to the NELS engine; an X-Windows interface, and ASCII interface and the Spatial Data Management System. This paper describes the design and operation of NELS as an information management tool and repository.
Embracing the Archives: How NPR Librarians Turned Their Collection into a Workflow Tool
ERIC Educational Resources Information Center
Sin, Lauren; Daugert, Katie
2013-01-01
Several years ago, National Public Radio (NPR) librarians began developing a new content management system (CMS). It was intended to offer desktop access for all NPR-produced content, including transcripts, audio, and metadata. Fast-forward to 2011, and their shiny, new database, Artemis, was ready for debut. Their next challenge: to teach a staff…
EPOS Data and Service Provision
NASA Astrophysics Data System (ADS)
Bailo, Daniele; Jeffery, Keith G.; Atakan, Kuvvet; Harrison, Matt
2017-04-01
EPOS is now in IP (implementation phase) after a successful PP (preparatory phase). EPOS consists of essentially two components, one ICS (Integrated Core Services) representing the integrating ICT (Information and Communication Technology) and many TCS (Thematic Core Services) representing the scientific domains. The architecture developed, demonstrated and agreed within the project during the PP is now being developed utilising co-design with the TCS teams and agile, spiral methods within the ICS team. The 'heart' of EPOS is the metadata catalog. This provides for the ICS a digital representation of the TCS assets (services, data, software, equipment, expertise…) thus facilitating access, interoperation and (re-)use. A major part of the work has been interactions with the TCS. The original intention to harvest information from the TCS required (and still requires) discussions to understand fully the TCS organisational structures linked with rights, security and privacy; their (meta)data syntax (structure) and semantics (meaning); their workflows and methods of working and the services offered. To complicate matters further the TCS are each at varying stages of development and the ICS design has to accommodate pre-existing, developing and expected future standards for metadata, data, software and processes. Through information documents, questionnaires and interviews/meetings the EPOS ICS team has collected DDSS (Data, Data Products, Software and Services) information from the TCS. The ICS team developed a simplified metadata model for presentation to the TCS and the ICS team will perform the mapping and conversion from this model to the internal detailed technical metadata model using (CERIF: a EU recommendation to Member States maintained, developed and promoted by euroCRIS www.eurocris.org ). At the time of writing the final modifications of the EPOS metadata model are being made, and the mappings to CERIF designed, prior to the main phase of (meta)data collection into the EPOS metadata catalog. In parallel work proceeds on the user interface softsare, the APIs (Application Programming Interfaces) to the TCS services, the harvesting method and software, the AAAI (Authentication, Authorisation, Accounting Infrastructure) and the system manager. The next steps will involve interfaces to ICS-D (Distributed ICS i.e. facilities and services for computing, data storage, detectors and instruments for data collection etc.) to which requests, software and data will be deployed and from which data will be generated. Associated with this will be the development of the workflow system which will assist the end-user in building a workflow to achieve the scientific objectives.
iRODS: A Distributed Data Management Cyberinfrastructure for Observatories
NASA Astrophysics Data System (ADS)
Rajasekar, A.; Moore, R.; Vernon, F.
2007-12-01
Large-scale and long-term preservation of both observational and synthesized data requires a system that virtualizes data management concepts. A methodology is needed that can work across long distances in space (distribution) and long-periods in time (preservation). The system needs to manage data stored on multiple types of storage systems including new systems that become available in the future. This concept is called infrastructure independence, and is typically implemented through virtualization mechanisms. Data grids are built upon concepts of data and trust virtualization. These concepts enable the management of collections of data that are distributed across multiple institutions, stored on multiple types of storage systems, and accessed by multiple types of clients. Data virtualization ensures that the name spaces used to identify files, users, and storage systems are persistent, even when files are migrated onto future technology. This is required to preserve authenticity, the link between the record and descriptive and provenance metadata. Trust virtualization ensures that access controls remain invariant as files are moved within the data grid. This is required to track the chain of custody of records over time. The Storage Resource Broker (http://www.sdsc.edu/srb) is one such data grid used in a wide variety of applications in earth and space sciences such as ROADNet (roadnet.ucsd.edu), SEEK (seek.ecoinformatics.org), GEON (www.geongrid.org) and NOAO (www.noao.edu). Recent extensions to data grids provide one more level of virtualization - policy or management virtualization. Management virtualization ensures that execution of management policies can be automated, and that rules can be created that verify assertions about the shared collections of data. When dealing with distributed large-scale data over long periods of time, the policies used to manage the data and provide assurances about the authenticity of the data become paramount. The integrated Rule-Oriented Data System (iRODS) (http://irods.sdsc.edu) provides the mechanisms needed to describe not only management policies, but also to track how the policies are applied and their execution results. The iRODS data grid maps management policies to rules that control the execution of the remote micro-services. As an example, a rule can be created that automatically creates a replica whenever a file is added to a specific collection, or extracts its metadata automatically and registers it in a searchable catalog. For the replication operation, the persistent state information consists of the replica location, the creation date, the owner, the replica size, etc. The mechanism used by iRODS for providing policy virtualization is based on well-defined functions, called micro-services, which are chained into alternative workflows using rules. A rule engine, based on the event-condition-action paradigm executes the rule-based workflows after an event. Rules can be deferred to a pre-determined time or executed on a periodic basis. As the data management policies evolve, the iRODS system can implement new rules, new micro-services, and new state information (metadata content) needed to manage the new policies. Each sub- collection can be managed using a different set of policies. The discussion of the concepts in rule-based policy virtualization and its application to long-term and large-scale data management for observatories such as ORION and NEON will be the basis of the paper.
NASA Astrophysics Data System (ADS)
Hernández, B. E.; Bugbee, K.; le Roux, J.; Beaty, T.; Hansen, M.; Staton, P.; Sisco, A. W.
2017-12-01
Earth observation (EO) data collected as part of NASA's Earth Observing System Data and Information System (EOSDIS) is now searchable via the Common Metadata Repository (CMR). The Analysis and Review of CMR (ARC) Team at Marshall Space Flight Center has been tasked with reviewing all NASA metadata records in the CMR ( 7,000 records). Each collection level record and constituent granule level metadata are reviewed for both completeness as well as compliance with the CMR's set of metadata standards, as specified in the Unified Metadata Model (UMM). NASA's Distributed Active Archive Centers (DAACs) have been harmonizing priority metadata records within the context of the inter-agency federal Big Earth Data Initiative (BEDI), which seeks to improve the discoverability, accessibility, and usability of EO data. Thus, the first phase of this project constitutes reviewing BEDI metadata records, while the second phase will constitute reviewing the remaining non-BEDI records in CMR. This presentation will discuss the ARC team's findings in terms of the overall quality of BEDI records across all DAACs as well as compliance with UMM standards. For instance, only a fifth of the collection-level metadata fields needed correction, compared to a quarter of the granule-level fields. It should be noted that the degree to which DAACs' metadata did not comply with the UMM standards may reflect multiple factors, such as recent changes in the UMM standards, and the utilization of different metadata formats (e.g. DIF 10, ECHO 10, ISO 19115-1) across the DAACs. Insights, constructive criticism, and lessons learned from this metadata review process will be contributed from both ORNL and SEDAC. Further inquiry along such lines may lead to insights which may improve the metadata curation process moving forward. In terms of the broader implications for metadata compliance with the UMM standards, this research has shown that a large proportion of the prioritized collections have already been made compliant, although the process of improving metadata quality is ongoing and iterative. Further research is also warranted into whether or not the gains in metadata quality are also driving gains in data use.
OlyMPUS - The Ontology-based Metadata Portal for Unified Semantics
NASA Astrophysics Data System (ADS)
Huffer, E.; Gleason, J. L.
2015-12-01
The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support data consumers and data providers, enabling the latter to register their data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS leverages the semantics and reasoning capabilities of ODISEES to provide data producers with a semi-automated interface for producing the semantically rich metadata needed to support ODISEES' data discovery and access services. It integrates the ODISEES metadata search system with multiple NASA data delivery tools to enable data consumers to create customized data sets for download to their computers, or for NASA Advanced Supercomputing (NAS) facility registered users, directly to NAS storage resources for access by applications running on NAS supercomputers. A core function of NASA's Earth Science Division is research and analysis that uses the full spectrum of data products available in NASA archives. Scientists need to perform complex analyses that identify correlations and non-obvious relationships across all types of Earth System phenomena. Comprehensive analytics are hindered, however, by the fact that many Earth science data products are disparate and hard to synthesize. Variations in how data are collected, processed, gridded, and stored, create challenges for data interoperability and synthesis, which are exacerbated by the sheer volume of available data. Robust, semantically rich metadata can support tools for data discovery and facilitate machine-to-machine transactions with services such as data subsetting, regridding, and reformatting. Such capabilities are critical to enabling the research activities integral to NASA's strategic plans. However, as metadata requirements increase and competing standards emerge, metadata provisioning becomes increasingly burdensome to data producers. The OlyMPUS system helps data providers produce semantically rich metadata, making their data more accessible to data consumers, and helps data consumers quickly discover and download the right data for their research.
ERIC Educational Resources Information Center
Mulrooney, Timothy J.
2009-01-01
A Geographic Information System (GIS) serves as the tangible and intangible means by which spatially related phenomena can be created, analyzed and rendered. GIS metadata serves as the formal framework to catalog information about a GIS data set. Metadata is independent of the encoded spatial and attribute information. GIS metadata is a subset of…
LVFS: A Scalable Petabye/Exabyte Data Storage System
NASA Astrophysics Data System (ADS)
Golpayegani, N.; Halem, M.; Masuoka, E. J.; Ye, G.; Devine, N. K.
2013-12-01
Managing petabytes of data with hundreds of millions of files is the first step necessary towards an effective big data computing and collaboration environment in a distributed system. We describe here the MODAPS LAADS Virtual File System (LVFS), a new storage architecture which replaces the previous MODAPS operational Level 1 Land Atmosphere Archive Distribution System (LAADS) NFS based approach to storing and distributing datasets from several instruments, such as MODIS, MERIS, and VIIRS. LAADS is responsible for the distribution of over 4 petabytes of data and over 300 million files across more than 500 disks. We present here the first LVFS big data comparative performance results and new capabilities not previously possible with the LAADS system. We consider two aspects in addressing inefficiencies of massive scales of data. First, is dealing in a reliable and resilient manner with the volume and quantity of files in such a dataset, and, second, minimizing the discovery and lookup times for accessing files in such large datasets. There are several popular file systems that successfully deal with the first aspect of the problem. Their solution, in general, is through distribution, replication, and parallelism of the storage architecture. The Hadoop Distributed File System (HDFS), Parallel Virtual File System (PVFS), and Lustre are examples of such file systems that deal with petabyte data volumes. The second aspect deals with data discovery among billions of files, the largest bottleneck in reducing access time. The metadata of a file, generally represented in a directory layout, is stored in ways that are not readily scalable. This is true for HDFS, PVFS, and Lustre as well. Recent experimental file systems, such as Spyglass or Pantheon, have attempted to address this problem through redesign of the metadata directory architecture. LVFS takes a radically different architectural approach by eliminating the need for a separate directory within the file system. The LVFS system replaces the NFS disk mounting approach of LAADS and utilizes the already existing highly optimized metadata database server, which is applicable to most scientific big data intensive compute systems. Thus, LVFS ties the existing storage system with the existing metadata infrastructure system which we believe leads to a scalable exabyte virtual file system. The uniqueness of the implemented design is not limited to LAADS but can be employed with most scientific data processing systems. By utilizing the Filesystem In Userspace (FUSE), a kernel module available in many operating systems, LVFS was able to replace the NFS system while staying POSIX compliant. As a result, the LVFS system becomes scalable to exabyte sizes owing to the use of highly scalable database servers optimized for metadata storage. The flexibility of the LVFS design allows it to organize data on the fly in different ways, such as by region, date, instrument or product without the need for duplication, symbolic links, or any other replication methods. We proposed here a strategic reference architecture that addresses the inefficiencies of scientific petabyte/exabyte file system access through the dynamic integration of the observing system's large metadata file.
Liu, Z; Sun, J; Smith, M; Smith, L; Warr, R
2013-11-01
Computer-assisted diagnosis (CAD) of malignant melanoma (MM) has been advocated to help clinicians to achieve a more objective and reliable assessment. However, conventional CAD systems examine only the features extracted from digital photographs of lesions. Failure to incorporate patients' personal information constrains the applicability in clinical settings. To develop a new CAD system to improve the performance of automatic diagnosis of melanoma, which, for the first time, incorporates digital features of lesions with important patient metadata into a learning process. Thirty-two features were extracted from digital photographs to characterize skin lesions. Patients' personal information, such as age, gender and, lesion site, and their combinations, was quantified as metadata. The integration of digital features and metadata was realized through an extended Laplacian eigenmap, a dimensionality-reduction method grouping lesions with similar digital features and metadata into the same classes. The diagnosis reached 82.1% sensitivity and 86.1% specificity when only multidimensional digital features were used, but improved to 95.2% sensitivity and 91.0% specificity after metadata were incorporated appropriately. The proposed system achieves a level of sensitivity comparable with experienced dermatologists aided by conventional dermoscopes. This demonstrates the potential of our method for assisting clinicians in diagnosing melanoma, and the benefit it could provide to patients and hospitals by greatly reducing unnecessary excisions of benign naevi. This paper proposes an enhanced CAD system incorporating clinical metadata into the learning process for automatic classification of melanoma. Results demonstrate that the additional metadata and the mechanism to incorporate them are useful for improving CAD of melanoma. © 2013 British Association of Dermatologists.
From petascale to exascale, the future of simulated climate data (Invited)
NASA Astrophysics Data System (ADS)
Lawrence, B.; Juckes, M. N.
2013-12-01
Coleridge ought to have said: data, data, everywhere, and all the data centres groan, data data everywhere, nor any I should clone. Except of course, he didn't say it, and we do clone data! While we've been dealing with terabytes of simulated datasets, downloading ("cloning") and analysing, has been a plausible way forward. In doing so, we have set up systems that support four broad classes of activities: personal and institutional data analysis, federated data systems, and data portals. We use metadata to manage the migration of data between these (and their communities) and we have built software systems. However, our metadata and software solutions are fragile, often based on soft money, and loose governance arrangements. We often download data with minimal provenance, and often many of us download the same data. In the not too distant future we can imagine exabytes of data being produced, and all these problems will get worse. Arguably we have no plausible methods of effectively exploiting such data - particularly if the analysis requires intercomparison. Yet of course, we know full well that intercomparison is at the heart of climate science. In this talk, we review the current status of simulation data management, with special emphasis on accessibility and usability. We talk about file formats, bundles of files, real and virtual, and simulation metadata. We introduce the InfraStructure for the European Network for Earth Simulation (IS-ENES) and its relationship with the Earth System Grid Federation (ESGF) as well as JASMIN, the UK Joint Analysis System. There will be a small digression on parallel data analysis - locally and distributed. we then progress to the near term problems (and solutions) for climate data before scoping out the problems of the future, both for data handling, and the models that produce the data. The way we think about data, computing, models, even ensemble design, may need to change.
Big Earth Data Initiative: Metadata Improvement: Case Studies
NASA Technical Reports Server (NTRS)
Kozimor, John; Habermann, Ted; Farley, John
2016-01-01
Big Earth Data Initiative (BEDI) The Big Earth Data Initiative (BEDI) invests in standardizing and optimizing the collection, management and delivery of U.S. Government's civil Earth observation data to improve discovery, access use, and understanding of Earth observations by the broader user community. Complete and consistent standard metadata helps address all three goals.
NASA Technical Reports Server (NTRS)
Graves, Sara J.
1994-01-01
Work on this project was focused on information management techniques for Marshall Space Flight Center's EOSDIS Version 0 Distributed Active Archive Center (DAAC). The centerpiece of this effort has been participation in EOSDIS catalog interoperability research, the result of which is a distributed Information Management System (IMS) allowing the user to query the inventories of all the DAAC's from a single user interface. UAH has provided the MSFC DAAC database server for the distributed IMS, and has contributed to definition and development of the browse image display capabilities in the system's user interface. Another important area of research has been in generating value-based metadata through data mining. In addition, information management applications for local inventory and archive management, and for tracking data orders were provided.
Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics
NASA Astrophysics Data System (ADS)
Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.
2015-12-01
Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in NetCDF format has been implemented and will be demonstrated at AGU.
A New Look at Data Usage by Using Metadata Attributes as Indicators of Data Quality
NASA Astrophysics Data System (ADS)
Won, Y. I.; Wanchoo, L.; Behnke, J.
2016-12-01
NASA's Earth Observing System Data and Information System (EOSDIS) stores and distributes data from EOS satellites, as well as ancillary, airborne, in-situ, and socio-economic data. Twelve EOSDIS data centers support different scientific disciplines by providing products and services tailored to specific science communities. Although discipline oriented, these data centers provide common data management functions of ingest, archive and distribution, as well as documentation of their data and services on their web-sites. The Earth Science Data and Information System (ESDIS) Project collects these metrics from the EOSDIS data centers on a daily basis through a tool called the ESDIS Metrics System (EMS). These metrics are used in this study. The implementation of the Earthdata Login - formerly known as the User Registration System (URS) - across the various NASA data centers provides the EMS additional information about users obtaining data products from EOSDIS data centers. These additional user attributes collected by the Earthdata login, such as the user's primary area of study can augment the understanding of data usage, which in turn can help the EOSDIS program better understand the users' needs. This study will review the key metrics (users, distributed volume, and files) in multiple ways to gain an understanding of the significance of the metadata. Characterizing the usability of data by key metadata elements such as discipline and study area, will assist in understanding how the users have evolved over time. The data usage pattern based on version numbers may also provide some insight into the level of data quality. In addition, the data metrics by various services such as the Open-source Project for a Network Data Access Protocol (OPeNDAP), Web Map Service (WMS), Web Coverage Service (WCS), and subsets, will address how these services have extended the usage of data. Over-all, this study will present the usage of data and metadata by metrics analyses and will assist data centers in better supporting the needs of the users.
Managing troubled data: Coastal data partnerships smooth data integration
Hale, S.S.; Hale, Miglarese A.; Bradley, M.P.; Belton, T.J.; Cooper, L.D.; Frame, M.T.; Friel, C.A.; Harwell, L.M.; King, R.E.; Michener, W.K.; Nicolson, D.T.; Peterjohn, B.G.
2003-01-01
Understanding the ecology, condition, and changes of coastal areas requires data from many sources. Broad-scale and long-term ecological questions, such as global climate change, biodiversity, and cumulative impacts of human activities, must be addressed with databases that integrate data from several different research and monitoring programs. Various barriers, including widely differing data formats, codes, directories, systems, and metadata used by individual programs, make such integration troublesome. Coastal data partnerships, by helping overcome technical, social, and organizational barriers, can lead to a better understanding of environmental issues, and may enable better management decisions. Characteristics of successful data partnerships include a common need for shared data, strong collaborative leadership, committed partners willing to invest in the partnership, and clear agreements on data standards and data policy. Emerging data and metadata standards that become widely accepted are crucial. New information technology is making it easier to exchange and integrate data. Data partnerships allow us to create broader databases than would be possible for any one organization to create by itself.
Managing troubled data: coastal data partnerships smooth data integration.
Hale, Stephen S; Miglarese, Anne Hale; Bradley, M Patricia; Belton, Thomas J; Cooper, Larry D; Frame, Michael T; Friel, Christopher A; Harwell, Linda M; King, Robert E; Michener, William K; Nicolson, David T; Peterjohn, Bruce G
2003-01-01
Understanding the ecology, condition, and changes of coastal areas requires data from many sources. Broad-scale and long-term ecological questions, such as global climate change, biodiversity, and cumulative impacts of human activities, must be addressed with databases that integrate data from several different research and monitoring programs. Various barriers, including widely differing data formats, codes, directories, systems, and metadata used by individual programs, make such integration troublesome. Coastal data partnerships, by helping overcome technical, social, and organizational barriers, can lead to a better understanding of environmental issues, and may enable better management decisions. Characteristics of successful data partnerships include a common need for shared data, strong collaborative leadership, committed partners willing to invest in the partnership, and clear agreements on data standards and data policy. Emerging data and metadata standards that become widely accepted are crucial. New information technology is making it easier to exchange and integrate data. Data partnerships allow us to create broader databases than would be possible for any one organization to create by itself.
NASA Astrophysics Data System (ADS)
Moore, J. A.; Serreze, M. C.; Williams, S.; Ramamurthy, M. K.; Middleton, D.
2014-12-01
The U.S. National Science Foundation has been providing data management support to the Arctic research community through the UCAR/NCAR since late 1995. Support began during the early planning phase of the Surface Heat Budget of the Arctic (SHEBA) Project and continues today with a major collaboration involving the NCAR Earth Observing Laboratory (EOL), the NCAR Computational Information Systems Laboratory (CISL), the UCAR Unidata Program, and the National Snow and Ice Data Center (NSIDC), in the Advanced Cooperative Arctic Data and Information System (ACADIS). These groups have managed thousands of datasets for hundreds of Principal Investigators. The datasets, including the metadata and documentation held in the archives vary in size from less than 30 kilobytes to tens of gigabytes and represent dozens of research disciplines. The ACADIS holdings alone include more than 50 scientific disciplines as defined by the NASA/GCMD keywords. The data formats vary from simple ASCII text to proprietary complex binary and imagery. A lot has changed in the way data are collected due to improved data collection technologies, real time processing and wide bandwidth communications. There have been some changes to data management best practices especially related to metadata, flexible formatting, DOIs, and interoperability with other archives to take advantage of new technologies, software and related support capabilities. ACADIS has spent more than 7 years working these issues and implementing an agile service approach. There are some very interesting challenges that we have been confronted with and overcome during the past 20 years. However, with all those improvements there are guiding principles for the data managers that are robust and remain important even after 20 years of experience. These include the provision of evolving standards and complete metadata records to describe each dataset, International data exchange and easy access to the archived data, and the inclusion of comprehensive documentation to foster long-term reuse potential of the data. The authors will provide details on the handling of these specific issues and also consider some other more subtle situations that continue to require serious consideration and problem solving.
Developing Data Citations from Digital Object Identifier Metadata
NASA Technical Reports Server (NTRS)
James, Nathan; Wanchoo, Lalit
2015-01-01
NASA's Earth Science Data and Information System (ESDIS) Project has been processing information for the registration of Digital Object Identifiers (DOI) for the last five years of which an automated system has been in operation for the last two years. The ESDIS DOI registration system has registered over 2000 DOIs with over 1000 DOIs held in reserve until all required information has been collected. By working towards the goal of assigning DOIs to the 8000+ data collections under its management, ESDIS has taken the first step towards facilitating the use of data citations with those products. Jeanne Behnke, ESDIS Deputy Project Manager has reviewed and approved the poster.
Making metadata usable in a multi-national research setting.
Ellul, Claire; Foord, Joanna; Mooney, John
2013-11-01
SECOA (Solutions for Environmental Contrasts in Coastal Areas) is a multi-national research project examining the effects of human mobility on urban settlements in fragile coastal environments. This paper describes the setting up of a SECOA metadata repository for non-specialist researchers such as environmental scientists and tourism experts. Conflicting usability requirements of two groups - metadata creators and metadata users - are identified along with associated limitations of current metadata standards. A description is given of a configurable metadata system designed to grow as the project evolves. This work is of relevance for similar projects such as INSPIRE. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
EARS : Repositioning data management near data acquisition.
NASA Astrophysics Data System (ADS)
Sinquin, Jean-Marc; Sorribas, Jordi; Diviacco, Paolo; Vandenberghe, Thomas; Munoz, Raquel; Garcia, Oscar
2016-04-01
The EU FP7 Projects Eurofleets and Eurofleets2 are an European wide alliance of marine research centers that aim to share their research vessels, to improve information sharing on planned, current and completed cruises, on details of ocean-going research vessels and specialized equipment, and to durably improve cost-effectiveness of cruises. Within this context logging of information on how, when and where anything happens on board of the vessel is crucial information for data users in a later stage. This forms a primordial step in the process of data quality control as it could assist in the understanding of anomalies and unexpected trends recorded in the acquired data sets. In this way completeness of the metadata is improved as it is recorded accurately at the origin of the measurement. The collection of this crucial information has been done in very different ways, using different procedures, formats and pieces of software in the context of the European Research Fleet. At the time that the Eurofleets project started, every institution and country had adopted different strategies and approaches, which complicated the task of users that need to log general purpose information and events on-board whenever they access a different platform loosing the opportunity to produce this valuable metadata on-board. Among the many goals the Eurofleets project has, a very important task is the development of an "event log software" called EARS (Eurofleets Automatic Reporting System) that enables scientists and operators to record what happens during a survey. EARS will allow users to fill, in a standardized way, the gap existing at the moment in metadata description that only very seldom links data with its history. Events generated automatically by acquisition instruments will also be handled, enhancing the granularity and precision of the event annotation. The adoption of a common procedure to log survey events and a common terminology to describe them is crucial to provide a friendly and successfully metadata on-board creation procedure for the whole the European Fleet. The possibility of automatically reporting metadata and general purpose data, will simplify the work of scientists and data managers with regards to data transmission. An improved accuracy and completeness of metadata is expected when events are recorded at acquisition time. This will also enhance multiple usages of the data as it allows verification of the different requirements existing in different disciplines.
Managing Sustainable Data Infrastructures: The Gestalt of EOSDIS
NASA Astrophysics Data System (ADS)
Behnke, J.; Lindsay, F. E.; Lowe, D. R.; Mitchell, A. E.; Lynnes, C.
2016-12-01
NASA's Earth Observing System Data and Information System (EOSDIS) has been a central component of the NASA Earth observation program since the 1990's. The data collected by NASA's remote sensing instruments represent a significant public investment in research. EOSDIS provides free and open access to this data to a worldwide public research community. From the very beginning, EOSDIS was conceived as a system built on partnerships between NASA Centers, US agencies and academia. EOSDIS manages a wide range of Earth science discipline data that include cryosphere, land cover change, polar processes, field campaigns, ocean surface, digital elevation, atmosphere dynamics and composition, and inter-disciplinary research, among many others. Over the years, EOSDIS has evolved to support increasingly complex and diverse NASA Earth Science data collections. EOSDIS epitomizes a System of Systems, whose many varied and distributed parts are integrated into a single, highly functional organized science data system. A distributed architecture was adopted to ensure discipline-specific support for the science data, while also leveraging standards and establishing policies and tools to enable interdisciplinary research, and analysis across multiple scientific instruments. The EOSDIS is composed of system elements such as geographically distributed archive centers used to manage the stewardship of data. The infrastructure consists of underlying capabilities/connections that enable the primary system elements to function together. For example, one key infrastructure component is the common metadata repository, which enables discovery of all data within the EOSDIS system. . EOSDIS employs processes and standards to ensure partners can work together effectively, and provide coherent services to users. While the separation into domain-specific science archives helps to manage the wide variety of missions and datasets, the common services and practices serve to knit the overall system together into a coherent whole, with sharing of data, metadata, information and software making EOSDIS more than the simple sum of its parts. This paper will describe those parts and how the whole system works together to deliver Earth science data to millions of users.
Charting the Course: Life Cycle Management of Mars Mission Digital Information
NASA Technical Reports Server (NTRS)
Reiz, Julie M.
2003-01-01
This viewgraph presentation reviews the life cycle management of MER Project information. This process was an essential key to the successful launch of the MER Project rovers. Incorporating digital information archive requirements early in the project life cycle resulted in: Design of an information system that included archive metadata, Reduced the risk of information loss through in-process appraisal, Easier transfer of project information to institutional online archive and Project appreciation for preserving information for reuse by future projects
Final Report for File System Support for Burst Buffers on HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, W.; Mohror, K.
Distributed burst buffers are a promising storage architecture for handling I/O workloads for exascale computing. As they are being deployed on more supercomputers, a file system that efficiently manages these burst buffers for fast I/O operations carries great consequence. Over the past year, FSU team has undertaken several efforts to design, prototype and evaluate distributed file systems for burst buffers on HPC systems. These include MetaKV: a Key-Value Store for Metadata Management of Distributed Burst Buffers, a user-level file system with multiple backends, and a specialized file system for large datasets of deep neural networks. Our progress for these respectivemore » efforts are elaborated further in this report.« less
NASA Astrophysics Data System (ADS)
Nakatani, T.; Inamura, Y.; Moriyama, K.; Ito, T.; Muto, S.; Otomo, T.
Neutron scattering can be a powerful probe in the investigation of many phenomena in the materials and life sciences. The Materials and Life Science Experimental Facility (MLF) at the Japan Proton Accelerator Research Complex (J-PARC) is a leading center of experimental neutron science and boasts one of the most intense pulsed neutron sources in the world. The MLF currently has 18 experimental instruments in operation that support a wide variety of users from across a range of research fields. The instruments include optical elements, sample environment apparatus and detector systems that are controlled and monitored electronically throughout an experiment. Signals from these components and those from the neutron source are converted into a digital format by the data acquisition (DAQ) electronics and recorded as time-tagged event data in the DAQ computers using "DAQ-Middleware". Operating in event mode, the DAQ system produces extremely large data files (˜GB) under various measurement conditions. Simultaneously, the measurement meta-data indicating each measurement condition is recorded in XML format by the MLF control software framework "IROHA". These measurement event data and meta-data are collected in the MLF common storage and cataloged by the MLF Experimental Database (MLF EXP-DB) based on a commercial XML database. The system provides a web interface for users to manage and remotely analyze experimental data.
NASA Astrophysics Data System (ADS)
Stall, S.
2016-12-01
The story of a sample starts with a proposal, a data management plan, and funded research. The sample is created, given a unique identifier (IGSN) and properly cared for during its journey to an appropriate storage location. Through its metadata, and publication information, the sample can become well known and shared with other researchers. Ultimately, a valuable sample can tell its entire story through its IGSN, associated ORCIDs, associated publication DOIs, and DOIs of data generated from sample analysis. This journey, or workflow, is in many ways still manual. Tools exist to generate IGSNs for the sample and subsamples. Publishers are committed to making IGSNs machine readable in their journals, but the connection back to the IGSN management system, specifically the System for Earth Sample Registration (SESAR) is not fully complete. Through encouragement of publishers, like AGU, and improved data management practices, such as those promoted by AGU's Data Management Assessment program, the complete lifecycle of a sample can and will be told through the journey it takes from creation, documentation (metadata), analysis, subsamples, publication, and sharing. Publishers and data facilities are using efforts like the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) to "implement and promote common policies and procedures for the publication and citation of data across Earth Science journals", including IGSNs. As our community improves its data management practices and publishers adopt and enforce machine readable use of unique sample identifiers, the ability to tell the entire story of a sample is close at hand. Better Data Management results in Better Science.
Moving Toward Real Time Data Handling: Data Management at the IRIS DMC
NASA Astrophysics Data System (ADS)
Ahern, T. K.; Benson, R. B.
2001-12-01
The IRIS Data Management Center at the University of Washington has become a major archive and distribution center for a wide variety of seismological data. With a mass storage system with a 360-terabyte capacity, the center is well positioned to manage the data flow, both inbound and outbound, from all anticipated seismic sources for the foreseeable future. As data flow in and out of the IRIS DMC at an increasing rate, new methods to deal with data using purely automated techniques are being developed. The on-line and self-service data repositories of SPYDERr and FARM are collections of seismograms for all larger events. The WWW tool WILBER and the client application WEED are examples of tools that provide convenient access to the 1/2 terabyte of SPYDERr and FARM data. The Buffer of Uniform Data (BUD) system provides access to continuous data available in real time from GSN, FDSN, US regional networks, and other globally distributed stations. Continuous data that have received quality control are always available from the archive of continuous data. This presentation will review current and future data access techniques supported at IRIS. One of the most difficult tasks at the DMC is the management of the metadata that describes all the stations, sensors, and data holdings. Demonstrations of tools that provide access to the metadata will be presented. This presentation will focus on the new techniques of data management now being developed at the IRIS DMC. We believe that these techniques are generally applicable to other types of geophysical data management as well.
Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry
NASA Astrophysics Data System (ADS)
Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.
2015-09-01
Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.
Evaluating and Evolving Metadata in Multiple Dialects
NASA Astrophysics Data System (ADS)
Kozimor, J.; Habermann, T.; Powers, L. A.; Gordon, S.
2016-12-01
Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways. This situation has led to the development of metadata repositories that can ingest and output metadata in multiple dialects. As an operational example, the NASA Common Metadata Repository (CMR) includes three different metadata dialects (DIF, ECHO, and ISO 19115-2). These systems raise a new question for metadata providers: if I have a choice of metadata dialects, which should I use and how do I make that decision? We have developed a collection of metadata evaluation tools that can be used to evaluate metadata records in many dialects for completeness with respect to recommendations from many organizations and communities. We have applied these tools to over 8000 collection and granule metadata records in four different dialects. This large collection of identical content in multiple dialects enables us to address questions about metadata and dialect evolution and to answer those questions quantitatively. We will describe those tools and results from evaluating the NASA CMR metadata collection.
POLARIS: Helping Managers Get Answers Fast!
NASA Technical Reports Server (NTRS)
Corcoran, Patricia M.; Webster, Jeffery
2007-01-01
This viewgraph presentation reviews the Project Online Library and Resource Information System (POLARIS) system. It is NASA-wide, web-based system, providing access to information related to Program and Project Management. It will provide a one-stop shop for access to: a searchable, sortable database of all requirements for all product lines, project life cycle diagrams with reviews, project life cycle diagrams with reviews, project review definitions with products review information from NPR 7123.1, NASA Systems Engineering Processes and Requirements, templates and examples of products, project standard WBSs with dictionaries, and requirements for implementation and approval, information from NASA s Metadata Manager (MdM): Attributes of Missions, Themes, Programs & Projects, NPR7120.5 waiver form and instructions and much more. The presentation reviews the plans and timelines for future revisions and modifications.
Computational Unification: a Vision for Connecting Researchers
NASA Astrophysics Data System (ADS)
Troy, R. M.; Kingrey, O. J.
2002-12-01
Computational Unification of science, once only a vision, is becoming a reality. This technology is based upon a scientifically defensible, general solution for Earth Science data management and processing. The computational unification of science offers a real opportunity to foster inter and intra-discipline cooperation, and the end of 're-inventing the wheel'. As we move forward using computers as tools, it is past time to move from computationally isolating, "one-off" or discipline-specific solutions into a unified framework where research can be more easily shared, especially with researchers in other disciplines. The author will discuss how distributed meta-data, distributed processing and distributed data objects are structured to constitute a working interdisciplinary system, including how these resources lead to scientific defensibility through known lineage of all data products. Illustration of how scientific processes are encapsulated and executed illuminates how previously written processes and functions are integrated into the system efficiently and with minimal effort. Meta-data basics will illustrate how intricate relationships may easily be represented and used to good advantage. Retrieval techniques will be discussed including trade-offs of using meta-data versus embedded data, how the two may be integrated, and how simplifying assumptions may or may not help. This system is based upon the experience of the Sequoia 2000 and BigSur research projects at the University of California, Berkeley, whose goals were to find an alternative to the Hughes EOS-DIS system and is presently offered by Science Tools corporation, of which the author is a principal.
ARIADNE: a Tracking System for Relationships in LHCb Metadata
NASA Astrophysics Data System (ADS)
Shapoval, I.; Clemencic, M.; Cattaneo, M.
2014-06-01
The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne - a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.
Beyond 10 Years of Evolving the IGSN Architecture: What's Next?
NASA Astrophysics Data System (ADS)
Lehnert, K.; Arko, R. A.
2016-12-01
The IGSN was developed as part of a US NSF-funded project, which started in 2004 to establish a registry for sample metadata, the System for Earth Sample Registration (SESAR). The initial version of the system provided a centralized solution for users to submit information about their samples and obtain IGSNs and bar codes. A new distributed architecture for the IGSN was designed at a workshop in 2011 that aimed to advance the global implementation of the IGSN. The workshop led to the founding of an international non-profit organization, the IGSN e.V., that adopted the governance model of the DataCite consortium as a non-profit membership organization and its architecture with a central registry and a network of distributed Allocating Agents that provide registration services to the users. Recent progress came at a workshop in 2015, where stakeholders from both geoscience and life science disciplines drafted a standard IGSN metadata schema for describing samples with an essential set of properties about the sample's origin and classification, creating a "birth certificate" for the sample. Consensus was reached that the IGSN should also be used to identify sampling features and collection of samples. The IGSN e.V. global network has steadily grown, with now members in 4 continents and 5 Allocating Agents operational in the US, Australia, and Europe. A Central Catalog has been established at the IGSN Management Office that harvests "birth certificate" metadata records from Allocating Agents via the Open Archives Initiative Protocol for Metadata Harvest (OAI-PMH), and publishes them as a Linked Open Data graph using the Resource Description Framework (RDF) and RDF Query Language (SPARQL) for reuse by Semantic Web clients. Next developments will include a web-based validation service that allows journal editors to check the validity of IGSNs and compliance with metadata requirements, and use of community-recommended vocabularies for specific disciplines.
The Digital Sample: Metadata, Unique Identification, and Links to Data and Publications
NASA Astrophysics Data System (ADS)
Lehnert, K. A.; Vinayagamoorthy, S.; Djapic, B.; Klump, J.
2006-12-01
A significant part of digital data in the Geosciences refers to physical samples of Earth materials, from igneous rocks to sediment cores to water or gas samples. The application and long-term utility of these sample-based data in research is critically dependent on (a) the availability of information (metadata) about the samples such as geographical location and time of sampling, or sampling method, (b) links between the different data types available for individual samples that are dispersed in the literature and in digital data repositories, and (c) access to the samples themselves. Major problems for achieving this include incomplete documentation of samples in publications, use of ambiguous sample names, and the lack of a central catalog that allows to find a sample's archiving location. The International Geo Sample Number IGSN, managed by the System for Earth Sample Registration SESAR, provides solutions for these problems. The IGSN is a unique persistent identifier for samples and other GeoObjects that can be obtained by submitting sample metadata to SESAR (www.geosamples.org). If data in a publication is referenced to an IGSN (rather than an ambiguous sample name), sample metadata can readily be extracted from the SESAR database, which evolves into a Global Sample Catalog that also allows to locate the owner or curator of the sample. Use of the IGSN in digital data systems allows building linkages between distributed data. SESAR is contributing to the development of sample metadata standards. SESAR will integrate the IGSN in persistent, resolvable identifiers based on the handle.net service to advance direct linkages between the digital representation of samples in SESAR (sample profiles) and their related data in the literature and in web-accessible digital data repositories. Technologies outlined by Klump et al. (this session) such as the automatic creation of ontologies by text mining applications will be explored for harvesting identifiers of publications and datasets that contain information about a specific sample in order to establish comprehensive data profiles for samples.
Storing, Browsing, Querying, and Sharing Data: the THREDDS Data Repository (TDR)
NASA Astrophysics Data System (ADS)
Wilson, A.; Lindholm, D.; Baltzer, T.
2005-12-01
The Unidata Internet Data Distribution (IDD) network delivers gigabytes of data per day in near real time to sites across the U.S. and beyond. The THREDDS Data Server (TDS) supports public browsing of metadata and data access via OPeNDAP enabled URLs for datasets such as these. With such large quantities of data, sites generally employ a simple data management policy, keeping the data for a relatively short term on the order of hours to perhaps a week or two. In order to save interesting data in longer term storage and make it available for sharing, a user must move the data herself. In this case the user is responsible for determining where space is available, executing the data movement, generating any desired metadata, and setting access control to enable sharing. This task sequence is generally based on execution of a sequence of low level operating system specific commands with significant user involvement. The LEAD (Linked Environments for Atmospheric Discovery) project is building a cyberinfrastructure to support research and education in mesoscale meteorology. LEAD orchestrations require large, robust, and reliable storage with speedy access to stage data and store both intermediate and final results. These requirements suggest storage solutions that involve distributed storage, replication, and interfacing to archival storage systems such as mass storage systems and tape or removable disks. LEAD requirements also include metadata generation and access in order to support querying. In support of both THREDDS and LEAD requirements, Unidata is designing and prototyping the THREDDS Data Repository (TDR), a framework for a modular data repository to support distributed data storage and retrieval using a variety of back end storage media and interchangeable software components. The TDR interface will provide high level abstractions for long term storage, controlled, fast and reliable access, and data movement capabilities via a variety of technologies such as OPeNDAP and gridftp. The modular structure will allow substitution of software components so that both simple and complex storage media can be integrated into the repository. It will also allow integration of different varieties of supporting software. For example, if replication is desired, replica management could be handled via a simple hash table or a complex solution such as Replica Locater Service (RLS). In order to ensure that metadata is available for all the data in the repository, the TDR will also generate THREDDS metadata when necessary. Users will be able to establish levels of access control to their metadata and data. Coupled with a THREDDS Data Server, both browsing via THREDDS catalogs and querying capabilities will be supported. This presentation will describe the motivating factors, current status, and future plans of the TDR. References: IDD: http://www.unidata.ucar.edu/content/software/idd/index.html THREDDS: http://www.unidata.ucar.edu/content/projects/THREDDS/tech/server/ServerStatus.html LEAD: http://lead.ou.edu/ RLS: http://www.isi.edu/~annc/papers/chervenakRLSjournal05.pdf
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kogalovskii, M.R.
This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.
Improvements to the Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)
NASA Astrophysics Data System (ADS)
Linsinbigler, M. A.; Gleason, J. L.; Huffer, E.
2016-12-01
The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support Earth Science data consumers and data providers, enabling the latter to register data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS complements the ODISEES' data discovery system with an intelligent tool to enable data producers to auto-generate semantically enhanced metadata and upload it to the metadata repository that drives ODISEES. Like ODISEES, the OlyMPUS metadata provisioning tool leverages robust semantics, a NoSQL database and query engine, an automated reasoning engine that performs first- and second-order deductive inferencing, and uses a controlled vocabulary to support data interoperability and automated analytics. The ODISEES data discovery portal leverages this metadata to provide a seamless data discovery and access experience for data consumers who are interested in comparing and contrasting the multiple Earth science data products available across NASA data centers. Olympus will support scientists' services and tools for performing complex analyses and identifying correlations and non-obvious relationships across all types of Earth System phenomena using the full spectrum of NASA Earth Science data available. By providing an intelligent discovery portal that supplies users - both human users and machines - with detailed information about data products, their contents and their structure, ODISEES will reduce the level of effort required to identify and prepare large volumes of data for analysis. This poster will explain how OlyMPUS leverages deductive reasoning and other technologies to create an integrated environment for generating and exploiting semantically rich metadata.
A Meta-Relational Approach for the Definition and Management of Hybrid Learning Objects
ERIC Educational Resources Information Center
Navarro, Antonio; Fernandez-Pampillon, Ana Ma.; Fernandez-Chamizo, Carmen; Fernandez-Valmayor, Alfredo
2013-01-01
Electronic learning objects (LOs) are commonly conceived of as digital units of information used for teaching and learning. To facilitate their classification for pedagogical planning and retrieval purposes, LOs are complemented with metadata (e.g., the author). These metadata are usually restricted by a set of predetermined tags to which the…
phosphorus retention data and metadata
phosphorus retention in wetlands data and metadataThis dataset is associated with the following publication:Lane , C., and B. Autrey. Phosphorus retention of forested and emergent marsh depressional wetlands in differing land uses in Florida, USA. Wetlands Ecology and Management. Springer Science and Business Media B.V;Formerly Kluwer Academic Publishers B.V., GERMANY, 24(1): 45-60, (2016).
PanMetaDocs - A tool for collecting and managing the long tail of "small science data"
NASA Astrophysics Data System (ADS)
Klump, J.; Ulbricht, D.
2011-12-01
In the early days of thinking about cyberinfrastructure the focus was on "big science data". Today, the challenge is not anymore to store several terabytes of data, but to manage data objects in a way that facilitates their re-use. Key to re-use by a user as a data consumer is proper documentation of the data. Also, data consumers need discovery metadata to find the data they need and they need descriptive metadata to be able to use the data they retrieved. Thus, data documentation faces the challenge to extensively and completely describe these objects, hold the items easily accessible at a sustainable cost level. However, data curation and documentation do not rank high in the everyday work of a scientist as a data producer. Data producers are often frustrated by being asked to provide metadata on their data over and over again, information that seemed very obvious from the context of their work. A challenge to data archives is the wide variety of metadata schemata in use, which creates a number of maintenance and design challenges of its own. PanMetaDocs addresses these issues by allowing an uploaded files to be described by more than one metadata object. PanMetaDocs, which was developed from PanMetaWorks, is a PHP based web application that allow to describe data with any xml-based metadata schema. Its user interface is browser based and was developed to collect metadata and data in collaborative scientific projects situated at one or more institutions. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. In the development of PanMetaDocs the business logic of panMetaWorks is reused, except for the authentication and data management functions of PanMetaWorks, which are delegated to the eSciDoc framework. The eSciDoc repository framework is designed as a service oriented architecture that can be controlled through a REST interface to create version controlled items with metadata records in XML format. PanMetaDocs utilizes the eSciDoc items model to add multiple metadata records that describe uploaded files in different metadata schemata. While datasets are collected and described, shared to collaborate with other scientists and finally published, data objects are transferred from a shared data curation domain into a persistent data curation domain. Through an RSS interface for recent datasets PanMetaWorks allows project members to be informed about data uploaded by other project members. The implementation of the OAI-PMH interface can be used to syndicate data catalogs to research data portals, such as the panFMP data portal framework. Once data objects are uploaded to the eSciDoc infrastructure it is possible to drop the software instance that was used for collecting the data, while the compiled data and metadata are accessible for other authorized applications through the institution's eSciDoc middleware. This approach of "expendable data curation tools" allows for a significant reduction in costs for software maintenance as expensive data capture applications do not need to be maintained indefinitely to ensure long term access to the stored data.
Heidelberger, Philip; Steinmacher-Burow, Burkhard
2015-01-06
According to one embodiment, a method for implementing an array-based queue in memory of a memory system that includes a controller includes configuring, in the memory, metadata of the array-based queue. The configuring comprises defining, in metadata, an array start location in the memory for the array-based queue, defining, in the metadata, an array size for the array-based queue, defining, in the metadata, a queue top for the array-based queue and defining, in the metadata, a queue bottom for the array-based queue. The method also includes the controller serving a request for an operation on the queue, the request providing the location in the memory of the metadata of the queue.
WGISS-45 International Directory Network (IDN) Report
NASA Technical Reports Server (NTRS)
Morahan, Michael
2018-01-01
The objective of this presentation is to provide IDN (International Directory Network) updates on features and activities to the Committee on Earth Observation Satellites (CEOS) Working Group on Information Systems and Services (WGISS) and provider community. The following topics will be will be discussed during the presentation: Transition of Providers DIF-9 (Directory Interchange Format-9) to DIF-10 Metadata Records in the Common Metadata Repository (CMR); GCMD (Global Change Master Directory) Keyword Update; DIF-10 and UMM-C (Unified Metadata Model-Collections) Schema Changes; Metadata Validation of Provider Metadata; docBUILDER for Submitting IDN Metadata to the CMR (i.e. Registration); and Mapping WGClimate Essential Climate Variable (ECV) Inventory to IDN Records.
BIO::Phylo-phyloinformatic analysis using perl.
Vos, Rutger A; Caravas, Jason; Hartmann, Klaas; Jensen, Mark A; Miller, Chase
2011-02-27
Phyloinformatic analyses involve large amounts of data and metadata of complex structure. Collecting, processing, analyzing, visualizing and summarizing these data and metadata should be done in steps that can be automated and reproduced. This requires flexible, modular toolkits that can represent, manipulate and persist phylogenetic data and metadata as objects with programmable interfaces. This paper presents Bio::Phylo, a Perl5 toolkit for phyloinformatic analysis. It implements classes and methods that are compatible with the well-known BioPerl toolkit, but is independent from it (making it easy to install) and features a richer API and a data model that is better able to manage the complex relationships between different fundamental data and metadata objects in phylogenetics. It supports commonly used file formats for phylogenetic data including the novel NeXML standard, which allows rich annotations of phylogenetic data to be stored and shared. Bio::Phylo can interact with BioPerl, thereby giving access to the file formats that BioPerl supports. Many methods for data simulation, transformation and manipulation, the analysis of tree shape, and tree visualization are provided. Bio::Phylo is composed of 59 richly documented Perl5 modules. It has been deployed successfully on a variety of computer architectures (including various Linux distributions, Mac OS X versions, Windows, Cygwin and UNIX-like systems). It is available as open source (GPL) software from http://search.cpan.org/dist/Bio-Phylo.
BIO::Phylo-phyloinformatic analysis using perl
2011-01-01
Background Phyloinformatic analyses involve large amounts of data and metadata of complex structure. Collecting, processing, analyzing, visualizing and summarizing these data and metadata should be done in steps that can be automated and reproduced. This requires flexible, modular toolkits that can represent, manipulate and persist phylogenetic data and metadata as objects with programmable interfaces. Results This paper presents Bio::Phylo, a Perl5 toolkit for phyloinformatic analysis. It implements classes and methods that are compatible with the well-known BioPerl toolkit, but is independent from it (making it easy to install) and features a richer API and a data model that is better able to manage the complex relationships between different fundamental data and metadata objects in phylogenetics. It supports commonly used file formats for phylogenetic data including the novel NeXML standard, which allows rich annotations of phylogenetic data to be stored and shared. Bio::Phylo can interact with BioPerl, thereby giving access to the file formats that BioPerl supports. Many methods for data simulation, transformation and manipulation, the analysis of tree shape, and tree visualization are provided. Conclusions Bio::Phylo is composed of 59 richly documented Perl5 modules. It has been deployed successfully on a variety of computer architectures (including various Linux distributions, Mac OS X versions, Windows, Cygwin and UNIX-like systems). It is available as open source (GPL) software from http://search.cpan.org/dist/Bio-Phylo PMID:21352572
Mashup of Geo and Space Science Data Provided via Relational Databases in the Semantic Web
NASA Astrophysics Data System (ADS)
Ritschel, B.; Seelus, C.; Neher, G.; Iyemori, T.; Koyama, Y.; Yatagai, A. I.; Murayama, Y.; King, T. A.; Hughes, J. S.; Fung, S. F.; Galkin, I. A.; Hapgood, M. A.; Belehaki, A.
2014-12-01
The use of RDBMS for the storage and management of geo and space science data and/or metadata is very common. Although the information stored in tables is based on a data model and therefore well organized and structured, a direct mashup with RDF based data stored in triple stores is not possible. One solution of the problem consists in the transformation of the whole content into RDF structures and storage in triple stores. Another interesting way is the use of a specific system/service, such as e.g. D2RQ, for the access to relational database content as virtual, read only RDF graphs. The Semantic Web based -proof of concept- GFZ ISDC uses the triple store Virtuoso for the storage of general context information/metadata to geo and space science satellite and ground station data. There is information about projects, platforms, instruments, persons, product types, etc. available but no detailed metadata about the data granuals itself. Such important information, as e.g. start or end time or the detailed spatial coverage of a single measurement is stored in RDBMS tables of the ISDC catalog system only. In order to provide a seamless access to all available information about the granuals/data products a mashup of the different data resources (triple store and RDBMS) is necessary. This paper describes the use of D2RQ for a Semantic Web/SPARQL based mashup of relational databases used for ISDC data server but also for the access to IUGONET and/or ESPAS and further geo and space science data resources. RDBMS Relational Database Management System RDF Resource Description Framework SPARQL SPARQL Protocol And RDF Query Language D2RQ Accessing Relational Databases as Virtual RDF Graphs GFZ ISDC German Research Centre for Geosciences Information System and Data Center IUGONET Inter-university Upper Atmosphere Global Observation Network (Japanese project) ESPAS Near earth space data infrastructure for e-science (European Union funded project)
EMERALD: Coping with the Explosion of Seismic Data
NASA Astrophysics Data System (ADS)
West, J. D.; Fouch, M. J.; Arrowsmith, R.
2009-12-01
The geosciences are currently generating an unparalleled quantity of new public broadband seismic data with the establishment of large-scale seismic arrays such as the EarthScope USArray, which are enabling new and transformative scientific discoveries of the structure and dynamics of the Earth’s interior. Much of this explosion of data is a direct result of the formation of the IRIS consortium, which has enabled an unparalleled level of open exchange of seismic instrumentation, data, and methods. The production of these massive volumes of data has generated new and serious data management challenges for the seismological community. A significant challenge is the maintenance and updating of seismic metadata, which includes information such as station location, sensor orientation, instrument response, and clock timing data. This key information changes at unknown intervals, and the changes are not generally communicated to data users who have already downloaded and processed data. Another basic challenge is the ability to handle massive seismic datasets when waveform file volumes exceed the fundamental limitations of a computer’s operating system. A third, long-standing challenge is the difficulty of exchanging seismic processing codes between researchers; each scientist typically develops his or her own unique directory structure and file naming convention, requiring that codes developed by another researcher be rewritten before they can be used. To address these challenges, we are developing EMERALD (Explore, Manage, Edit, Reduce, & Analyze Large Datasets). The overarching goal of the EMERALD project is to enable more efficient and effective use of seismic datasets ranging from just a few hundred to millions of waveforms with a complete database-driven system, leading to higher quality seismic datasets for scientific analysis and enabling faster, more efficient scientific research. We will present a preliminary (beta) version of EMERALD, an integrated, extensible, standalone database server system based on the open-source PostgreSQL database engine. The system is designed for fast and easy processing of seismic datasets, and provides the necessary tools to manage very large datasets and all associated metadata. EMERALD provides methods for efficient preprocessing of seismic records; large record sets can be easily and quickly searched, reviewed, revised, reprocessed, and exported. EMERALD can retrieve and store station metadata and alert the user to metadata changes. The system provides many methods for visualizing data, analyzing dataset statistics, and tracking the processing history of individual datasets. EMERALD allows development and sharing of visualization and processing methods using any of 12 programming languages. EMERALD is designed to integrate existing software tools; the system provides wrapper functionality for existing widely-used programs such as GMT, SOD, and TauP. Users can interact with EMERALD via a web browser interface, or they can directly access their data from a variety of database-enabled external tools. Data can be imported and exported from the system in a variety of file formats, or can be directly requested and downloaded from the IRIS DMC from within EMERALD.
Design and deployment of a large brain-image database for clinical and nonclinical research
NASA Astrophysics Data System (ADS)
Yang, Guo Liang; Lim, Choie Cheio Tchoyoson; Banukumar, Narayanaswami; Aziz, Aamer; Hui, Francis; Nowinski, Wieslaw L.
2004-04-01
An efficient database is an essential component of organizing diverse information on image metadata and patient information for research in medical imaging. This paper describes the design, development and deployment of a large database system serving as a brain image repository that can be used across different platforms in various medical researches. It forms the infrastructure that links hospitals and institutions together and shares data among them. The database contains patient-, pathology-, image-, research- and management-specific data. The functionalities of the database system include image uploading, storage, indexing, downloading and sharing as well as database querying and management with security and data anonymization concerns well taken care of. The structure of database is multi-tier client-server architecture with Relational Database Management System, Security Layer, Application Layer and User Interface. Image source adapter has been developed to handle most of the popular image formats. The database has a user interface based on web browsers and is easy to handle. We have used Java programming language for its platform independency and vast function libraries. The brain image database can sort data according to clinically relevant information. This can be effectively used in research from the clinicians" points of view. The database is suitable for validation of algorithms on large population of cases. Medical images for processing could be identified and organized based on information in image metadata. Clinical research in various pathologies can thus be performed with greater efficiency and large image repositories can be managed more effectively. The prototype of the system has been installed in a few hospitals and is working to the satisfaction of the clinicians.
Deck, John; Gaither, Michelle R; Ewing, Rodney; Bird, Christopher E; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J; Crandall, Eric D
2017-08-01
The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information's (NCBI's) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science.
Deck, John; Gaither, Michelle R.; Ewing, Rodney; Bird, Christopher E.; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J.; Crandall, Eric D.
2017-01-01
The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information’s (NCBI’s) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science. PMID:28771471
Automatic meta-data collection of STP observation data
NASA Astrophysics Data System (ADS)
Ishikura, S.; Kimura, E.; Murata, K.; Kubo, T.; Shinohara, I.
2006-12-01
For the geo-science and the STP (Solar-Terrestrial Physics) studies, various observations have been done by satellites and ground-based observatories up to now. These data are saved and managed at many organizations, but no common procedure and rule to provide and/or share these data files. Researchers have felt difficulty in searching and analyzing such different types of data distributed over the Internet. To support such cross-over analyses of observation data, we have developed the STARS (Solar-Terrestrial data Analysis and Reference System). The STARS consists of client application (STARS-app), the meta-database (STARS- DB), the portal Web service (STARS-WS) and the download agent Web service (STARS DLAgent-WS). The STARS-DB includes directory information, access permission, protocol information to retrieve data files, hierarchy information of mission/team/data and user information. Users of the STARS are able to download observation data files without knowing locations of the files by using the STARS-DB. We have implemented the Portal-WS to retrieve meta-data from the meta-database. One reason we use the Web service is to overcome a variety of firewall restrictions which is getting stricter in recent years. Now it is difficult for the STARS client application to access to the STARS-DB by sending SQL query to obtain meta- data from the STARS-DB. Using the Web service, we succeeded in placing the STARS-DB behind the Portal- WS and prevent from exposing it on the Internet. The STARS accesses to the Portal-WS by sending the SOAP (Simple Object Access Protocol) request over HTTP. Meta-data is received as a SOAP Response. The STARS DLAgent-WS provides clients with data files downloaded from data sites. The data files are provided with a variety of protocols (e.g., FTP, HTTP, FTPS and SFTP). These protocols are individually selected at each site. The clients send a SOAP request with download request messages and receive observation data files as a SOAP Response with DIME-Attachment. By introducing the DLAgent-WS, we overcame the problem that the data management policies of each data site are independent. Another important issue to be overcome is how to collect the meta-data of observation data files. So far, STARS-DB managers have added new records to the meta-database and updated them manually. We have had a lot of troubles to maintain the meta-database because observation data are generated every day and the quantity of data files increases explosively. For that purpose, we have attempted to automate collection of the meta-data. In this research, we adopted the RSS 1.0 (RDF Site Summary) as a format to exchange meta-data in the STP fields. The RSS is an RDF vocabulary that provides a multipurpose extensible meta-data description and is suitable for syndication of meta-data. Most of the data in the present study are described in the CDF (Common Data Format), which is a self- describing data format. We have converted meta-information extracted from the CDF data files into RSS files. The program to generate the RSS files is executed on data site server once a day and the RSS files provide information of new data files. The RSS files are collected by RSS collection server once a day and the meta- data are stored in the STARS-DB.
Information System through ANIS at CeSAM
NASA Astrophysics Data System (ADS)
Moreau, C.; Agneray, F.; Gimenez, S.
2015-09-01
ANIS (AstroNomical Information System) is a web generic tool developed at CeSAM to facilitate and standardize the implementation of astronomical data of various kinds through private and/or public dedicated Information Systems. The architecture of ANIS is composed of a database server which contains the project data, a web user interface template which provides high level services (search, extract and display imaging and spectroscopic data using a combination of criteria, an object list, a sql query module or a cone search interfaces), a framework composed of several packages, and a metadata database managed by a web administration entity. The process to implement a new ANIS instance at CeSAM is easy and fast : the scientific project has to submit data or a data secure access, the CeSAM team installs the new instance (web interface template and the metadata database), and the project administrator can configure the instance with the web ANIS-administration entity. Currently, the CeSAM offers through ANIS a web access to VO compliant Information Systems for different projects (HeDaM, HST-COSMOS, CFHTLS-ZPhots, ExoDAT,...).
Government information resource catalog and its service system realization
NASA Astrophysics Data System (ADS)
Gui, Sheng; Li, Lin; Wang, Hong; Peng, Zifeng
2007-06-01
During the process of informatization, there produces a great deal of information resources. In order to manage these information resources and use them to serve the management of business, government decision and public life, it is necessary to establish a transparent and dynamic information resource catalog and its service system. This paper takes the land-house management information resource for example. Aim at the characteristics of this kind of information, this paper does classification, identification and description of land-house information in an uniform specification and method, establishes land-house information resource catalog classification system&, metadata standard, identification standard and land-house thematic thesaurus, and in the internet environment, user can search and get their interested information conveniently. Moreover, under the network environment, to achieve speedy positioning, inquiring, exploring and acquiring various types of land-house management information; and satisfy the needs of sharing, exchanging, application and maintenance of land-house management information resources.
A Lifecycle Approach to Brokered Data Management for Hydrologic Modeling Data Using Open Standards.
NASA Astrophysics Data System (ADS)
Blodgett, D. L.; Booth, N.; Kunicki, T.; Walker, J.
2012-12-01
The U.S. Geological Survey Center for Integrated Data Analytics has formalized an information management-architecture to facilitate hydrologic modeling and subsequent decision support throughout a project's lifecycle. The architecture is based on open standards and open source software to decrease the adoption barrier and to build on existing, community supported software. The components of this system have been developed and evaluated to support data management activities of the interagency Great Lakes Restoration Initiative, Department of Interior's Climate Science Centers and WaterSmart National Water Census. Much of the research and development of this system has been in cooperation with international interoperability experiments conducted within the Open Geospatial Consortium. Community-developed standards and software, implemented to meet the unique requirements of specific disciplines, are used as a system of interoperable, discipline specific, data types and interfaces. This approach has allowed adoption of existing software that satisfies the majority of system requirements. Four major features of the system include: 1) assistance in model parameter and forcing creation from large enterprise data sources; 2) conversion of model results and calibrated parameters to standard formats, making them available via standard web services; 3) tracking a model's processes, inputs, and outputs as a cohesive metadata record, allowing provenance tracking via reference to web services; and 4) generalized decision support tools which rely on a suite of standard data types and interfaces, rather than particular manually curated model-derived datasets. Recent progress made in data and web service standards related to sensor and/or model derived station time series, dynamic web processing, and metadata management are central to this system's function and will be presented briefly along with a functional overview of the applications that make up the system. As the separate pieces of this system progress, they will be combined and generalized to form a sort of social network for nationally consistent hydrologic modeling.
Long-term Science Data Curation Using a Digital Object Model and Open-Source Frameworks
NASA Astrophysics Data System (ADS)
Pan, J.; Lenhardt, W.; Wilson, B. E.; Palanisamy, G.; Cook, R. B.
2010-12-01
Scientific digital content, including Earth Science observations and model output, has become more heterogeneous in format and more distributed across the Internet. In addition, data and metadata are becoming necessarily linked internally and externally on the Web. As a result, such content has become more difficult for providers to manage and preserve and for users to locate, understand, and consume. Specifically, it is increasingly harder to deliver relevant metadata and data processing lineage information along with the actual content consistently. Readme files, data quality information, production provenance, and other descriptive metadata are often separated in the storage level as well as in the data search and retrieval interfaces available to a user. Critical archival metadata, such as auditing trails and integrity checks, are often even more difficult for users to access, if they exist at all. We investigate the use of several open-source software frameworks to address these challenges. We use Fedora Commons Framework and its digital object abstraction as the repository, Drupal CMS as the user-interface, and the Islandora module as the connector from Drupal to Fedora Repository. With the digital object model, metadata of data description and data provenance can be associated with data content in a formal manner, so are external references and other arbitrary auxiliary information. Changes are formally audited on an object, and digital contents are versioned and have checksums automatically computed. Further, relationships among objects are formally expressed with RDF triples. Data replication, recovery, metadata export are supported with standard protocols, such as OAI-PMH. We provide a tentative comparative analysis of the chosen software stack with the Open Archival Information System (OAIS) reference model, along with our initial results with the existing terrestrial ecology data collections at NASA’s ORNL Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC).
Taming Big Data Variety in the Earth Observing System Data and Information System
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Walter, Jeff
2015-01-01
Although the volume of the remote sensing data managed by the Earth Observing System Data and Information System is formidable, an oft-overlooked challenge is the variety of data. The diversity in satellite instruments, science disciplines and user communities drives cost as much or more as the data volume. Several strategies are used to tame this variety: data allocation to distinct centers of expertise; a common metadata repository for discovery, data format standards and conventions; and services that further abstract the variations in data.
Handling Metadata in a Neurophysiology Laboratory
Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G.; Riehle, Alexa; Denker, Michael; Grün, Sonja
2016-01-01
To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework. PMID:27486397
Handling Metadata in a Neurophysiology Laboratory.
Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G; Riehle, Alexa; Denker, Michael; Grün, Sonja
2016-01-01
To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework.
Metazen – metadata capture for metagenomes
2014-01-01
Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508
Metazen - metadata capture for metagenomes.
Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker
2014-01-01
As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.
Chase, Katherine J.; Bock, Andrew R.; Sando, Roy
2017-01-05
This report provides an overview of current (2016) U.S. Geological Survey policies and practices related to publishing data on ScienceBase, and an example interactive mapping application to display those data. ScienceBase is an integrated data sharing platform managed by the U.S. Geological Survey. This report describes resources that U.S. Geological Survey Scientists can use for writing data management plans, formatting data, and creating metadata, as well as for data and metadata review, uploading data and metadata to ScienceBase, and sharing metadata through the U.S. Geological Survey Science Data Catalog. Because data publishing policies and practices are evolving, scientists should consult the resources cited in this paper for definitive policy information.An example is provided where, using the content of a published ScienceBase data release that is associated with an interpretive product, a simple user interface is constructed to demonstrate how the open source capabilities of the R programming language and environment can interact with the properties and objects of the ScienceBase item and be used to generate interactive maps.
A metadata approach for clinical data management in translational genomics studies in breast cancer.
Papatheodorou, Irene; Crichton, Charles; Morris, Lorna; Maccallum, Peter; Davies, Jim; Brenton, James D; Caldas, Carlos
2009-11-30
In molecular profiling studies of cancer patients, experimental and clinical data are combined in order to understand the clinical heterogeneity of the disease: clinical information for each subject needs to be linked to tumour samples, macromolecules extracted, and experimental results. This may involve the integration of clinical data sets from several different sources: these data sets may employ different data definitions and some may be incomplete. In this work we employ semantic web techniques developed within the CancerGrid project, in particular the use of metadata elements and logic-based inference to annotate heterogeneous clinical information, integrate and query it. We show how this integration can be achieved automatically, following the declaration of appropriate metadata elements for each clinical data set; we demonstrate the practicality of this approach through application to experimental results and clinical data from five hospitals in the UK and Canada, undertaken as part of the METABRIC project (Molecular Taxonomy of Breast Cancer International Consortium). We describe a metadata approach for managing similarities and differences in clinical datasets in a standardized way that uses Common Data Elements (CDEs). We apply and evaluate the approach by integrating the five different clinical datasets of METABRIC.
Small file aggregation in a parallel computing system
Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang
2014-09-02
Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.
Improving Metadata Compliance for Earth Science Data Records
NASA Astrophysics Data System (ADS)
Armstrong, E. M.; Chang, O.; Foster, D.
2014-12-01
One of the recurring challenges of creating earth science data records is to ensure a consistent level of metadata compliance at the granule level where important details of contents, provenance, producer, and data references are necessary to obtain a sufficient level of understanding. These details are important not just for individual data consumers but also for autonomous software systems. Two of the most popular metadata standards at the granule level are the Climate and Forecast (CF) Metadata Conventions and the Attribute Conventions for Dataset Discovery (ACDD). Many data producers have implemented one or both of these models including the Group for High Resolution Sea Surface Temperature (GHRSST) for their global SST products and the Ocean Biology Processing Group for NASA ocean color and SST products. While both the CF and ACDD models contain various level of metadata richness, the actual "required" attributes are quite small in number. Metadata at the granule level becomes much more useful when recommended or optional attributes are implemented that document spatial and temporal ranges, lineage and provenance, sources, keywords, and references etc. In this presentation we report on a new open source tool to check the compliance of netCDF and HDF5 granules to the CF and ACCD metadata models. The tool, written in Python, was originally implemented to support metadata compliance for netCDF records as part of the NOAA's Integrated Ocean Observing System. It outputs standardized scoring for metadata compliance for both CF and ACDD, produces an objective summary weight, and can be implemented for remote records via OPeNDAP calls. Originally a command-line tool, we have extended it to provide a user-friendly web interface. Reports on metadata testing are grouped in hierarchies that make it easier to track flaws and inconsistencies in the record. We have also extended it to support explicit metadata structures and semantic syntax for the GHRSST project that can be easily adapted to other satellite missions as well. Overall, we hope this tool will provide the community with a useful mechanism to improve metadata quality and consistency at the granule level by providing objective scoring and assessment, as well as encourage data producers to improve metadata quality and quantity.
Improving Lab Sample Management - POS/MCEARD
"Scientists face increasing challenges in managing their laboratory samples, including long-term storage of legacy samples, tracking multiple aliquots of samples for many experiments, and linking metadata to these samples. Other factors complicating sample management include the...
Ertürk, Korhan Levent; Şengül, Gökhan
2012-01-01
We developed 3D simulation software of human organs/tissues; we developed a database to store the related data, a data management system to manage the created data, and a metadata system for the management of data. This approach provides two benefits: first of all the developed system does not require to keep the patient's/subject's medical images on the system, providing less memory usage. Besides the system also provides 3D simulation and modification options, which will help clinicians to use necessary tools for visualization and modification operations. The developed system is tested in a case study, in which a 3D human brain model is created and simulated from 2D MRI images of a human brain, and we extended the 3D model to include the spreading cortical depression (SCD) wave front, which is an electrical phoneme that is believed to cause the migraine. PMID:23258956
Requirements and Information Metadata System
2007-03-01
the Sept. 11, 2001, attacks; (2) poor management decisions early in the project; (3) inadequate project oversight, and (4) a lack of sound IT...they are natural.”19 Other analysts argue that intelligence failures are not so inevitable and not always successful. For example, Ariel Levite ...Inevitable,” 31, World Politics (1978): 61-80. 20 Ariel Levite , Intelligence and Strategic Surprises (New York: Columbia University Press, 1987
Towards Precise Metadata-set for Discovering 3D Geospatial Models in Geo-portals
NASA Astrophysics Data System (ADS)
Zamyadi, A.; Pouliot, J.; Bédard, Y.
2013-09-01
Accessing 3D geospatial models, eventually at no cost and for unrestricted use, is certainly an important issue as they become popular among participatory communities, consultants, and officials. Various geo-portals, mainly established for 2D resources, have tried to provide access to existing 3D resources such as digital elevation model, LIDAR or classic topographic data. Describing the content of data, metadata is a key component of data discovery in geo-portals. An inventory of seven online geo-portals and commercial catalogues shows that the metadata referring to 3D information is very different from one geo-portal to another as well as for similar 3D resources in the same geo-portal. The inventory considered 971 data resources affiliated with elevation. 51% of them were from three geo-portals running at Canadian federal and municipal levels whose metadata resources did not consider 3D model by any definition. Regarding the remaining 49% which refer to 3D models, different definition of terms and metadata were found, resulting in confusion and misinterpretation. The overall assessment of these geo-portals clearly shows that the provided metadata do not integrate specific and common information about 3D geospatial models. Accordingly, the main objective of this research is to improve 3D geospatial model discovery in geo-portals by adding a specific metadata-set. Based on the knowledge and current practices on 3D modeling, and 3D data acquisition and management, a set of metadata is proposed to increase its suitability for 3D geospatial models. This metadata-set enables the definition of genuine classes, fields, and code-lists for a 3D metadata profile. The main structure of the proposal contains 21 metadata classes. These classes are classified in three packages as General and Complementary on contextual and structural information, and Availability on the transition from storage to delivery format. The proposed metadata set is compared with Canadian Geospatial Data Infrastructure (CGDI) metadata which is an implementation of North American Profile of ISO-19115. The comparison analyzes the two metadata against three simulated scenarios about discovering needed 3D geo-spatial datasets. Considering specific metadata about 3D geospatial models, the proposed metadata-set has six additional classes on geometric dimension, level of detail, geometric modeling, topology, and appearance information. In addition classes on data acquisition, preparation, and modeling, and physical availability have been specialized for 3D geospatial models.
Revision of IRIS/IDA Seismic Station Metadata
NASA Astrophysics Data System (ADS)
Xu, W.; Davis, P.; Auerbach, D.; Klimczak, E.
2017-12-01
Trustworthy data quality assurance has always been one of the goals of seismic network operators and data management centers. This task is considerably complex and evolving due to the huge quantities as well as the rapidly changing characteristics and complexities of seismic data. Published metadata usually reflect instrument response characteristics and their accuracies, which includes zero frequency sensitivity for both seismometer and data logger as well as other, frequency-dependent elements. In this work, we are mainly focused studying the variation of the seismometer sensitivity with time of IRIS/IDA seismic recording systems with a goal to improve the metadata accuracy for the history of the network. There are several ways to measure the accuracy of seismometer sensitivity for the seismic stations in service. An effective practice recently developed is to collocate a reference seismometer in proximity to verify the in-situ sensors' calibration. For those stations with a secondary broadband seismometer, IRIS' MUSTANG metric computation system introduced a transfer function metric to reflect two sensors' gain ratios in the microseism frequency band. In addition, a simulation approach based on M2 tidal measurements has been proposed and proven to be effective. In this work, we compare and analyze the results from three different methods, and concluded that the collocated-sensor method is most stable and reliable with the minimum uncertainties all the time. However, for epochs without both the collocated sensor and secondary seismometer, we rely on the analysis results from tide method. For the data since 1992 on IDA stations, we computed over 600 revised seismometer sensitivities for all the IRIS/IDA network calibration epochs. Hopefully further revision procedures will help to guarantee that the data is accurately reflected by the metadata of these stations.
NASA Astrophysics Data System (ADS)
Strasser, C.; Borda, S.; Cruse, P.; Kunze, J.
2013-12-01
There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are a lack of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. Last year we developed an open source web application, DataUp, to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. With funding from the NSF via a supplemental grant to the DataONE project, we are working to improve upon DataUp. Our main goal for DataUp 2.0 is to ensure organizations and repositories are able to adopt and adapt DataUp to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between the California Digital Library, DataONE, the San Diego Supercomputing Center, and Microsoft Research Connections.
Process Architecture for Managing Digital Object Identifiers
NASA Astrophysics Data System (ADS)
Wanchoo, L.; James, N.; Stolte, E.
2014-12-01
In 2010, NASA's Earth Science Data and Information System (ESDIS) Project implemented a process for registering Digital Object Identifiers (DOIs) for data products distributed by Earth Observing System Data and Information System (EOSDIS). For the first 3 years, ESDIS evolved the process involving the data provider community in the development of processes for creating and assigning DOIs, and guidelines for the landing page. To accomplish this, ESDIS established two DOI User Working Groups: one for reviewing the DOI process whose recommendations were submitted to ESDIS in February 2014; and the other recently tasked to review and further develop DOI landing page guidelines for ESDIS approval by end of 2014. ESDIS has recently upgraded the DOI system from a manually-driven system to one that largely automates the DOI process. The new automated feature include: a) reviewing the DOI metadata, b) assigning of opaque DOI name if data provider chooses, and c) reserving, registering, and updating the DOIs. The flexibility of reserving the DOI allows data providers to embed and test the DOI in the data product metadata before formally registering with EZID. The DOI update process allows the changing of any DOI metadata except the DOI name unless the name has not been registered. Currently, ESDIS has processed a total of 557 DOIs of which 379 DOIs are registered with EZID and 178 are reserved with ESDIS. The DOI incorporates several metadata elements that effectively identify the data product and the source of availability. Of these elements, the Uniform Resource Locator (URL) attribute has the very important function of identifying the landing page which describes the data product. ESDIS in consultation with data providers in the Earth Science community is currently developing landing page guidelines that specify the key data product descriptive elements to be included on each data product's landing page. This poster will describe in detail the unique automated process and underlying system implemented by ESDIS for registering DOIs, as well as some of the lessons learned from the development of the process. In addition, this paper will summarize the recommendations made by the DOI Process and DOI Landing Page User Working Groups, and the procedures developed for implementing those recommendations.
Automated Transformation of CDISC ODM to OpenClinica.
Gessner, Sophia; Storck, Michael; Hegselmann, Stefan; Dugas, Martin; Soto-Rey, Iñaki
2017-01-01
Due to the increasing use of electronic data capture systems for clinical research, the interest in saving resources by automatically generating and reusing case report forms in clinical studies is growing. OpenClinica, an open-source electronic data capture system enables the reuse of metadata in its own Excel import template, hampering the reuse of metadata defined in other standard formats. One of these standard formats is the Operational Data Model for metadata, administrative and clinical data in clinical studies. This work suggests a mapping from Operational Data Model to OpenClinica and describes the implementation of a converter to automatically generate OpenClinica conform case report forms based upon metadata in the Operational Data Model.
ERIC Educational Resources Information Center
Solomou, Georgia; Pierrakeas, Christos; Kameas, Achilles
2015-01-01
The ability to effectively administrate educational resources in terms of accessibility, reusability and interoperability lies in the adoption of an appropriate metadata schema, able of adequately describing them. A considerable number of different educational metadata schemas can be found in literature, with the IEEE LOM being the most widely…
Creating FGDC and NBII metadata with Metavist 2005.
David J. Rugg
2004-01-01
This report documents a computer program for creating metadata compliant with the Federal Geographic Data Committee (FGDC) 1998 metadata standard or the National Biological Information Infrastructure (NBII) 1999 Biological Data Profile for the FGDC standard. The software runs under the Microsoft Windows 2000 and XP operating systems, and requires the presence of...
Representing Hydrologic Models as HydroShare Resources to Facilitate Model Sharing and Collaboration
NASA Astrophysics Data System (ADS)
Castronova, A. M.; Goodall, J. L.; Mbewe, P.
2013-12-01
The CUAHSI HydroShare project is a collaborative effort that aims to provide software for sharing data and models within the hydrologic science community. One of the early focuses of this work has been establishing metadata standards for describing models and model-related data as HydroShare resources. By leveraging this metadata definition, a prototype extension has been developed to create model resources that can be shared within the community using the HydroShare system. The extension uses a general model metadata definition to create resource objects, and was designed so that model-specific parsing routines can extract and populate metadata fields from model input and output files. The long term goal is to establish a library of supported models where, for each model, the system has the ability to extract key metadata fields automatically, thereby establishing standardized model metadata that will serve as the foundation for model sharing and collaboration within HydroShare. The Soil Water & Assessment Tool (SWAT) is used to demonstrate this concept through a case study application.
ATLAS Data Management Accounting with Hadoop Pig and HBase
NASA Astrophysics Data System (ADS)
Lassnig, Mario; Garonne, Vincent; Dimitrov, Gancho; Canali, Luca
2012-12-01
The ATLAS Distributed Data Management system requires accounting of its contents at the metadata layer. This presents a hard problem due to the large scale of the system, the high dimensionality of attributes, and the high rate of concurrent modifications of data. The system must efficiently account more than 90PB of disk and tape that store upwards of 500 million files across 100 sites globally. In this work a generic accounting system is presented, which is able to scale to the requirements of ATLAS. The design and architecture is presented, and the implementation is discussed. An emphasis is placed on the design choices such that the underlying data models are generally applicable to different kinds of accounting, reporting and monitoring.
Calculating a checksum with inactive networking components in a computing system
Aho, Michael E; Chen, Dong; Eisley, Noel A; Gooding, Thomas M; Heidelberger, Philip; Tauferner, Andrew T
2014-12-16
Calculating a checksum utilizing inactive networking components in a computing system, including: identifying, by a checksum distribution manager, an inactive networking component, wherein the inactive networking component includes a checksum calculation engine for computing a checksum; sending, to the inactive networking component by the checksum distribution manager, metadata describing a block of data to be transmitted by an active networking component; calculating, by the inactive networking component, a checksum for the block of data; transmitting, to the checksum distribution manager from the inactive networking component, the checksum for the block of data; and sending, by the active networking component, a data communications message that includes the block of data and the checksum for the block of data.
Calculating a checksum with inactive networking components in a computing system
Aho, Michael E; Chen, Dong; Eisley, Noel A; Gooding, Thomas M; Heidelberger, Philip; Tauferner, Andrew T
2015-01-27
Calculating a checksum utilizing inactive networking components in a computing system, including: identifying, by a checksum distribution manager, an inactive networking component, wherein the inactive networking component includes a checksum calculation engine for computing a checksum; sending, to the inactive networking component by the checksum distribution manager, metadata describing a block of data to be transmitted by an active networking component; calculating, by the inactive networking component, a checksum for the block of data; transmitting, to the checksum distribution manager from the inactive networking component, the checksum for the block of data; and sending, by the active networking component, a data communications message that includes the block of data and the checksum for the block of data.
EOS ODL Metadata On-line Viewer
NASA Astrophysics Data System (ADS)
Yang, J.; Rabi, M.; Bane, B.; Ullman, R.
2002-12-01
We have recently developed and deployed an EOS ODL metadata on-line viewer. The EOS ODL metadata viewer is a web server that takes: 1) an EOS metadata file in Object Description Language (ODL), 2) parameters, such as which metadata to view and what style of display to use, and returns an HTML or XML document displaying the requested metadata in the requested style. This tool is developed to address widespread complaints by science community that the EOS Data and Information System (EOSDIS) metadata files in ODL are difficult to read by allowing users to upload and view an ODL metadata file in different styles using a web browser. Users have the selection to view all the metadata or part of the metadata, such as Collection metadata, Granule metadata, or Unsupported Metadata. Choices of display styles include 1) Web: a mouseable display with tabs and turn-down menus, 2) Outline: Formatted and colored text, suitable for printing, 3) Generic: Simple indented text, a direct representation of the underlying ODL metadata, and 4) None: No stylesheet is applied and the XML generated by the converter is returned directly. Not all display styles are implemented for all the metadata choices. For example, Web style is only implemented for Collection and Granule metadata groups with known attribute fields, but not for Unsupported, Other, and All metadata. The overall strategy of the ODL viewer is to transform an ODL metadata file to a viewable HTML in two steps. The first step is to convert the ODL metadata file to an XML using a Java-based parser/translator called ODL2XML. The second step is to transform the XML to an HTML using stylesheets. Both operations are done on the server side. This allows a lot of flexibility in the final result, and is very portable cross-platform. Perl CGI behind the Apache web server is used to run the Java ODL2XML, and then run the results through an XSLT processor. The EOS ODL viewer can be accessed from either a PC or a Mac using Internet Explorer 5.0+ or Netscape 4.7+.
Scientific Workflows + Provenance = Better (Meta-)Data Management
NASA Astrophysics Data System (ADS)
Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.
2013-12-01
The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
Managing Sustainable Data Infrastructures: The Gestalt of EOSDIS
NASA Technical Reports Server (NTRS)
Behnke, Jeanne; Lowe, Dawn; Lindsay, Francis; Lynnes, Chris; Mitchell, Andrew
2016-01-01
EOSDIS epitomizes a System of Systems, whose many varied and distributed parts are integrated into a single, highly functional organized science data system. A distributed architecture was adopted to ensure discipline-specific support for the science data, while also leveraging standards and establishing policies and tools to enable interdisciplinary research, and analysis across multiple scientific instruments. The EOSDIS is composed of system elements such as geographically distributed archive centers used to manage the stewardship of data. The infrastructure consists of underlying capabilities connections that enable the primary system elements to function together. For example, one key infrastructure component is the common metadata repository, which enables discovery of all data within the EOSDIS system. EOSDIS employs processes and standards to ensure partners can work together effectively, and provide coherent services to users.
Hume, Samuel; Chow, Anthony; Evans, Julie; Malfait, Frederik; Chason, Julie; Wold, J. Darcy; Kubick, Wayne; Becnel, Lauren B.
2018-01-01
The Clinical Data Interchange Standards Consortium (CDISC) is a global non-profit standards development organization that creates consensus-based standards for clinical and translational research. Several of these standards are now required by regulators for electronic submissions of regulated clinical trials’ data and by government funding agencies. These standards are free and open, available for download on the CDISC Website as PDFs. While these documents are human readable, they are not amenable to ready use by electronic systems. CDISC launched the CDISC Shared Health And Research Electronic library (SHARE) to provide the standards metadata in machine-readable formats to facilitate the automated management and implementation of the standards. This paper describes how CDISC SHARE’S standards can facilitate collecting, aggregating and analyzing standardized data from early design to end analysis; and its role as a central resource providing information systems with metadata that drives process automation including study setup and data pipelining. PMID:29888049
Seqcrawler: biological data indexing and browsing platform.
Sallou, Olivier; Bretaudeau, Anthony; Roult, Aurelien
2012-07-24
Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search their data, there is a lack of free and open source solutions to browse one's own set of data with a flexible query system and able to scale from a single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search their meta-data but also to build a larger information system with custom subsets of data. The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (pieces of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance users' result analysis. Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.
Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins
NASA Astrophysics Data System (ADS)
Lambert, F.; Odier, J.; Fulachier, J.; ATLAS Collaboration
2017-10-01
The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.
Metazen – metadata capture for metagenomes
Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...
2014-12-08
Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Metazen – metadata capture for metagenomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bischof, Jared; Harrison, Travis; Paczian, Tobias
Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Building a Digital Library for Multibeam Data, Images and Documents
NASA Astrophysics Data System (ADS)
Miller, S. P.; Staudigel, H.; Koppers, A.; Johnson, C.; Cande, S.; Sandwell, D.; Peckman, U.; Becker, J. J.; Helly, J.; Zaslavsky, I.; Schottlaender, B. E.; Starr, S.; Montoya, G.
2001-12-01
The Scripps Institution of Oceanography, the UCSD Libraries and the San Diego Supercomputing Center have joined forces to establish a digital library for accessing a wide range of multibeam and marine geophysical data, to a community that ranges from the MGG researcher to K-12 outreach clients. This digital library collection will include 233 multibeam cruises with grids, plots, photographs, station data, technical reports, planning documents and publications, drawn from the holdings of the Geological Data Center and the SIO Archives. Inquiries will be made through an Ocean Exploration Console, reminiscent of a cockpit display where a multitude of data may be displayed individually or in two or three-dimensional projections. These displays will provide access to cruise data as well as global databases such as Global Topography, crustal age, and sediment thickness, thus meeting the day-to-day needs of researchers as well as educators, students, and the public. The prototype contains a few selected expeditions, and a review of the initial approach will be solicited from the user community during the poster session. The search process can be focused by a variety of constraints: geospatial (lat-lon box), temporal (e.g., since 1996), keyword (e.g., cruise, place name, PI, etc.), or expert-level (e.g., K-6 or researcher). The Storage Resource Broker (SRB) software from the SDSC manages the evolving collection as a series of distributed but related archives in various media, from shipboard data through processing and final archiving. The latest version of MB-System provides for the systematic creation of standard metadata, and for the harvesting of metadata from multibeam files. Automated scripts will be used to load the metadata catalog to enable queries with an Oracle database management system. These new efforts to bridge the gap between libraries and data archives are supported by the NSF Information Technology and National Science Digital Library (NSDL) programs, augmented by UC funds, and closely coordinated with Digital Library for Earth System Education (DLESE) activities.
CERA - the technical basis for WDCC
NASA Astrophysics Data System (ADS)
Thiemann, Hannes; Lautenschlager, Michael
2010-05-01
The World Data Centre for Climate (WDCC) is hosted by the German Climate Computing Centre (DKRZ). It collects, stores and disseminates data for climate research in order to serve the scientific community. Emphasis hereby is spent on climate modelling and related data products. CERA (Climate and environmental retrieval and archive) is the infrastructure hosting data and metadata from WDCC. Data originates from projects like IPCC (and IPCC-DDC), ENSEMBLES, COPS and several others. Currently more than 400 terabytes of data are managed within CERA. Even more data is addressed through metadata. Data stored inline in CERA is currently archived in an Oracle Database which itself is transparently linked to a sophisticated HSM system. Within this HSM system a wide range of storage systems like different RAID and tape devices are used. HSM for CERA is configured in such a way, that a unique media failure may not cause any data loss. Correct and complete metadata are an important ingredient for relevant archiving procedures. Within CERA special care is taken to assure these goals. As a high end service DOIs can be assigned to data entities as they make the management of intellectual property easier and more convenient. Any data format can be served from within CERA although most of the data stored in CERA is in either GRIB or netCDF format. Out of the box CERA provides several data reduction mechanisms like time-slicing or regional selection. More elaborate functions like format conversion or diverse processing is attached either in- our outline. Fine-grained access control allows data to be distributed under divers data policies. Currently up to 800 users are registered within CERA. More than 600.000 downloads (255 terabytes) have been served from CERA within 2009. At present CERA is restructured, more specific details of the current implementation and the future development will be given in the presentation.
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.
2017-01-01
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040
National Geothermal Data System State Contributions by Data Type (Appendix A1-b)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Love, Diane
Multipaged spreadsheet listing an inventory of data submissions to the State contributions to the National Geothermal Data System project by services, by state, by metadata compilations, metadata, and map count, including a summary of information.
Building Format-Agnostic Metadata Repositories
NASA Astrophysics Data System (ADS)
Cechini, M.; Pilone, D.
2010-12-01
This presentation will discuss the problems that surround persisting and discovering metadata in multiple formats; a set of tenets that must be addressed in a solution; and NASA’s Earth Observing System (EOS) ClearingHOuse’s (ECHO) proposed approach. In order to facilitate cross-discipline data analysis, Earth Scientists will potentially interact with more than one data source. The most common data discovery paradigm relies on services and/or applications facilitating the discovery and presentation of metadata. What may not be common are the formats in which the metadata are formatted. As the number of sources and datasets utilized for research increases, it becomes more likely that a researcher will encounter conflicting metadata formats. Metadata repositories, such as the EOS ClearingHOuse (ECHO), along with data centers, must identify ways to address this issue. In order to define the solution to this problem, the following tenets are identified: - There exists a set of ‘core’ metadata fields recommended for data discovery. - There exists a set of users who will require the entire metadata record for advanced analysis. - There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. - There will never be a cessation of new formats or a total retirement of all old formats. - Users should be presented metadata in a consistent format. ECHO has undertaken an effort to transform its metadata ingest and discovery services in order to support the growing set of metadata formats. In order to address the previously listed items, ECHO’s new metadata processing paradigm utilizes the following approach: - Identify a cross-format set of ‘core’ metadata fields necessary for discovery. - Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. - Archive the original metadata in its entirety for presentation to users requiring the full record. - Provide on-demand translation of ‘core’ metadata to any supported result format. With this identified approach, the Earth Scientist is provided with a consistent data representation as they interact with a variety of datasets that utilize multiple metadata formats. They are then able to focus their efforts on the more critical research activities which they are undertaking.
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations
Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley; ...
2017-06-20
Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley
Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
Metadata for data rescue and data at risk
Anderson, William L.; Faundeen, John L.; Greenberg, Jane; Taylor, Fraser
2011-01-01
Scientific data age, become stale, fall into disuse and run tremendous risks of being forgotten and lost. These problems can be addressed by archiving and managing scientific data over time, and establishing practices that facilitate data discovery and reuse. Metadata documentation is integral to this work and essential for measuring and assessing high priority data preservation cases. The International Council for Science: Committee on Data for Science and Technology (CODATA) has a newly appointed Data-at-Risk Task Group (DARTG), participating in the general arena of rescuing data. The DARTG primary objective is building an inventory of scientific data that are at risk of being lost forever. As part of this effort, the DARTG is testing an approach for documenting endangered datasets. The DARTG is developing a minimal and easy to use set of metadata properties for sufficiently describing endangered data, which will aid global data rescue missions. The DARTG metadata framework supports rapid capture, and easy documentation, across an array of scientific domains. This paper reports on the goals and principles supporting the DARTG metadata schema, and provides a description of the preliminary implementation.
Rich Support for Heterogeneous Polar Data in RAMADDA
NASA Astrophysics Data System (ADS)
McWhirter, J.; Crosby, C. J.; Griffith, P. C.; Khalsa, S.; Lazzara, M. A.; Weber, W. J.
2013-12-01
Difficult to navigate environments, tenuous logistics, strange forms, deeply rooted cultures - these are all experiences shared by Polar scientist in the field as well as the developers of the underlying data management systems back in the office. Among the key data management challenges that Polar investigations present are the heterogeneity and complexity of data that are generated. Polar regions are intensely studied across many science domains through a variety of techniques - satellite and aircraft remote sensing, in-situ observation networks, modeling, sociological investigations, and extensive PI-driven field project data collection. While many data management efforts focus on large homogeneous collections of data targeting specific science domains (e.g., satellite, GPS, modeling), multi-disciplinary efforts that focus on Polar data need to be able to address a wide range of data formats, science domains and user communities. There is growing use of the RAMADDA (Repository for Archiving, Managing and Accessing Diverse Data) system to manage and provide services for Polar data. RAMADDA is a freely available extensible data repository framework that supports a wide range of data types and services to allow the creation, management, discovery and use of data and metadata. The broad range of capabilities provided by RAMADDA and its extensibility makes it well-suited as an archive solution for Polar data. RAMADDA can run in a number of diverse contexts - as a centralized archive, at local institutions, and can even run on an investigator's laptop in the field, providing in-situ metadata and data management services. We are actively developing archives and support for a number of Polar initiatives: - NASA-Arctic Boreal Vulnerability Experiment (ABoVE): ABoVE is a long-term multi-instrument field campaign that will make use of a wide range of data. We have developed an extensive ontology of program, project and site metadata in RAMADDA, in support of the ABoVE Science Definition Team and Project Office. See: http://above.nasa.gov - UNAVCO Terrestrial Laser Scanning (TLS): UNAVCO's Polar program provides support for terrestrial laser scanning field projects. We are using RAMADDA to archive these field projects, with over 40 projects ingested to date. - NASA-IceBridge: As part of the NASA LiDAR Access System (NLAS) project, RAMADDA supports numerous airborne and satellite LiDAR data sets - GLAS, LVIS, ATM, Paris, McORDS, etc. - Antarctic Meteorological Research Center (AMRC): Satellite and surface observation network - Support for numerous other data from AON-ACADIS, Greenland GC-Net, NOAA-GMD, AmeriFlux, etc. In this talk we will discuss some of the challenges that Polar data brings to geoinformatics and describe the approaches we have taken to address these challenges in RAMADDA.
Automatic Extraction of Metadata from Scientific Publications for CRIS Systems
ERIC Educational Resources Information Center
Kovacevic, Aleksandar; Ivanovic, Dragan; Milosavljevic, Branko; Konjovic, Zora; Surla, Dusan
2011-01-01
Purpose: The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS). Design/methodology/approach: The system is based on machine learning and performs automatic extraction…
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D.; Goodall, J. L.; Band, L. E.; Merwade, V.; Couch, A.; Arrigo, J.; Hooper, R. P.; Valentine, D. W.; Maidment, D. R.
2013-12-01
HydroShare is an online, collaborative system being developed for sharing hydrologic data and models. The goal of HydroShare is to enable scientists to easily discover and access data and models, retrieve them to their desktop or perform analyses in a distributed computing environment that may include grid, cloud or high performance computing model instances as necessary. Scientists may also publish outcomes (data, results or models) into HydroShare, using the system as a collaboration platform for sharing data, models and analyses. HydroShare is expanding the data sharing capability of the CUAHSI Hydrologic Information System by broadening the classes of data accommodated, creating new capability to share models and model components, and taking advantage of emerging social media functionality to enhance information about and collaboration around hydrologic data and models. One of the fundamental concepts in HydroShare is that of a Resource. All content is represented using a Resource Data Model that separates system and science metadata and has elements common to all resources as well as elements specific to the types of resources HydroShare will support. These will include different data types used in the hydrology community and models and workflows that require metadata on execution functionality. HydroShare will use the integrated Rule-Oriented Data System (iRODS) to manage federated data content and perform rule-based background actions on data and model resources, including parsing to generate metadata catalog information and the execution of models and workflows. This presentation will introduce the HydroShare functionality developed to date, describe key elements of the Resource Data Model and outline the roadmap for future development.
Publishing NASA Metadata as Linked Open Data for Semantic Mashups
NASA Astrophysics Data System (ADS)
Wilson, Brian; Manipon, Gerald; Hua, Hook
2014-05-01
Data providers are now publishing more metadata in more interoperable forms, e.g. Atom or RSS 'casts', as Linked Open Data (LOD), or as ISO Metadata records. A major effort on the part of the NASA's Earth Science Data and Information System (ESDIS) project is the aggregation of metadata that enables greater data interoperability among scientific data sets regardless of source or application. Both the Earth Observing System (EOS) ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) repositories contain metadata records for NASA (and other) datasets and provided services. These records contain typical fields for each dataset (or software service) such as the source, creation date, cognizant institution, related access URL's, and domain and variable keywords to enable discovery. Under a NASA ACCESS grant, we demonstrated how to publish the ECHO and GCMD dataset and services metadata as LOD in the RDF format. Both sets of metadata are now queryable at SPARQL endpoints and available for integration into "semantic mashups" in the browser. It is straightforward to reformat sets of XML metadata, including ISO, into simple RDF and then later refine and improve the RDF predicates by reusing known namespaces such as Dublin core, georss, etc. All scientific metadata should be part of the LOD world. In addition, we developed an "instant" drill-down and browse interface that provides faceted navigation so that the user can discover and explore the 25,000 datasets and 3000 services. The available facets and the free-text search box appear in the left panel, and the instantly updated results for the dataset search appear in the right panel. The user can constrain the value of a metadata facet simply by clicking on a word (or phrase) in the "word cloud" of values for each facet. The display section for each dataset includes the important metadata fields, a full description of the dataset, potentially some related URL's, and a "search" button that points to an OpenSearch GUI that is pre-configured to search for granules within the dataset. We will present our experiences with converting NASA metadata into LOD, discuss the challenges, illustrate some of the enabled mashups, and demonstrate the latest version of the "instant browse" interface for navigating multiple metadata collections.
CHIME: A Metadata-Based Distributed Software Development Environment
2005-01-01
structures by using typography , graphics , and animation. The Software Im- mersion in our conceptual model for CHIME can be seen as a form of Software...Even small- to medium-sized development efforts may involve hundreds of artifacts -- design documents, change requests, test cases and results, code...for managing and organizing information from all phases of the software lifecycle. CHIME is designed around an XML-based metadata architecture, in
Parallel file system with metadata distributed across partitioned key-value store c
Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron
2017-09-19
Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Automatic publishing ISO 19115 metadata with PanMetaDocs using SensorML information
NASA Astrophysics Data System (ADS)
Stender, Vivien; Ulbricht, Damian; Schroeder, Matthias; Klump, Jens
2014-05-01
Terrestrial Environmental Observatories (TERENO) is an interdisciplinary and long-term research project spanning an Earth observation network across Germany. It includes four test sites within Germany from the North German lowlands to the Bavarian Alps and is operated by six research centers of the Helmholtz Association. The contribution by the participating research centers is organized as regional observatories. A challenge for TERENO and its observatories is to integrate all aspects of data management, data workflows, data modeling and visualizations into the design of a monitoring infrastructure. TERENO Northeast is one of the sub-observatories of TERENO and is operated by the German Research Centre for Geosciences (GFZ) in Potsdam. This observatory investigates geoecological processes in the northeastern lowland of Germany by collecting large amounts of environmentally relevant data. The success of long-term projects like TERENO depends on well-organized data management, data exchange between the partners involved and on the availability of the captured data. Data discovery and dissemination are facilitated not only through data portals of the regional TERENO observatories but also through a common spatial data infrastructure TEODOOR (TEreno Online Data repOsitORry). TEODOOR bundles the data, provided by the different web services of the single observatories, and provides tools for data discovery, visualization and data access. The TERENO Northeast data infrastructure integrates data from more than 200 instruments and makes data available through standard web services. Geographic sensor information and services are described using the ISO 19115 metadata schema. TEODOOR accesses the OGC Sensor Web Enablement (SWE) interfaces offered by the regional observatories. In addition to the SWE interface, TERENO Northeast also published data through DataCite. The necessary metadata are created in an automated process by extracting information from the SWE SensorML to create ISO 19115 compliant metadata. The resulting metadata file is stored in the GFZ Potsdam data infrastructure. The publishing workflow for file based research datasets at GFZ Potsdam is based on the eSciDoc infrastructure, using PanMetaDocs (PMD) as the graphical user interface. PMD is a collaborative, metadata based data and information exchange platform [1]. Besides SWE, metadata are also syndicated by PMD through an OAI-PMH interface. In addition, metadata from other observatories, projects or sensors in TERENO can be accessed through the TERENO Northeast data portal. [1] http://meetingorganizer.copernicus.org/EGU2012/EGU2012-7058-2.pdf
Registered File Support for Critical Operations Files at (Space Infrared Telescope Facility) SIRTF
NASA Technical Reports Server (NTRS)
Turek, G.; Handley, Tom; Jacobson, J.; Rector, J.
2001-01-01
The SIRTF Science Center's (SSC) Science Operations System (SOS) has to contend with nearly one hundred critical operations files via comprehensive file management services. The management is accomplished via the registered file system (otherwise known as TFS) which manages these files in a registered file repository composed of a virtual file system accessible via a TFS server and a file registration database. The TFS server provides controlled, reliable, and secure file transfer and storage by registering all file transactions and meta-data in the file registration database. An API is provided for application programs to communicate with TFS servers and the repository. A command line client implementing this API has been developed as a client tool. This paper describes the architecture, current implementation, but more importantly, the evolution of these services based on evolving community use cases and emerging information system technology.
Web Content Management Systems: An Analysis of Forensic Investigatory Challenges.
Horsman, Graeme
2018-02-26
With an increase in the creation and maintenance of personal websites, web content management systems are now frequently utilized. Such systems offer a low cost and simple solution for those seeking to develop an online presence, and subsequently, a platform from which reported defamatory content, abuse, and copyright infringement has been witnessed. This article provides an introductory forensic analysis of the three current most popular web content management systems available, WordPress, Drupal, and Joomla! Test platforms have been created, and their site structures have been examined to provide guidance for forensic practitioners facing investigations of this type. Result's document available metadata for establishing site ownership, user interactions, and stored content following analysis of artifacts including Wordpress's wp_users, and wp_comments tables, Drupal's "watchdog" records, and Joomla!'s _users, and _content tables. Finally, investigatory limitations documenting the difficulties of investigating WCMS usage are noted, and analysis recommendations are offered. © 2018 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Wright, D. J.; Lassoued, Y.; Dwyer, N.; Haddad, T.; Bermudez, L. E.; Dunne, D.
2009-12-01
Coastal mapping plays an important role in informing marine spatial planning, resource management, maritime safety, hazard assessment and even national sovereignty. As such, there is now a plethora of data/metadata catalogs, pre-made maps, tabular and text information on resource availability and exploitation, and decision-making tools. A recent trend has been to encapsulate these in a special class of web-enabled geographic information systems called a coastal web atlas (CWA). While multiple benefits are derived from tailor-made atlases, there is great value added from the integration of disparate CWAs. CWAs linked to one another can query more successfully to optimize planning and decision-making. If a dataset is missing in one atlas, it may be immediately located in another. Similar datasets in two atlases may be combined to enhance study in either region. *But how best to achieve semantic interoperability to mitigate vague data queries, concepts or natural language semantics when retrieving and integrating data and information?* We report on the development of a new prototype seeking to interoperate between two initial CWAs: the Marine Irish Digital Atlas (MIDA) and the Oregon Coastal Atlas (OCA). These two mature atlases are used as a testbed for more regional connections, with the intent for the OCA to use lessons learned to develop a regional network of CWAs along the west coast, and for MIDA to do the same in building and strengthening atlas networks with the UK, Belgium, and other parts of Europe. Our prototype uses semantic interoperability via services harmonization and ontology mediation, allowing local atlases to use their own data structures, and vocabularies (ontologies). We use standard technologies such as OGC Web Map Services (WMS) for delivering maps, and OGC Catalogue Service for the Web (CSW) for delivering and querying ISO-19139 metadata. The metadata records of a given CWA use a given ontology of terms called local ontology. Human or machine users formulate their requests using a common ontology of metadata terms, called global ontology. A CSW mediator rewrites the user’s request into CSW requests over local CSWs using their own (local) ontologies, collects the results and sends them back to the user. To extend the system, we have recently added global maritime boundaries and are also considering nearshore ocean observing system data. Ongoing work includes adding WFS, error management, and exception handling, enabling Smart Searches, and writing full documentation. This prototype is a central research project of the new International Coastal Atlas Network (ICAN), a group of 30+ organizations from 14 nations (and growing) dedicated to seeking interoperability approaches to CWAs in support of coastal zone management and the translation of coastal science to coastal decision-making.
Technical Challenges of Enterprise Imaging: HIMSS-SIIM Collaborative White Paper.
Clunie, David A; Dennison, Don K; Cram, Dawn; Persons, Kenneth R; Bronkalla, Mark D; Primo, Henri Rik
2016-10-01
This white paper explores the technical challenges and solutions for acquiring (capturing) and managing enterprise images, particularly those involving visible light applications. The types of acquisition devices used for various general-purpose photography and specialized applications including dermatology, endoscopy, and anatomic pathology are reviewed. The formats and standards used, and the associated metadata requirements and communication protocols for transfer and workflow are considered. Particular emphasis is placed on the importance of metadata capture in both order- and encounter-based workflow. The benefits of using DICOM to provide a standard means of recording and accessing both metadata and image and video data are considered, as is the role of IHE and FHIR.
NASA Astrophysics Data System (ADS)
Longo, S.; Nativi, S.; Leone, C.; Migliorini, S.; Mazari Villanova, L.
2012-04-01
Italian Polar Metadata System C.Leone, S.Longo, S.Migliorini, L.Mazari Villanova, S. Nativi The Italian Antarctic Research Programme (PNRA) is a government initiative funding and coordinating scientific research activities in polar regions. PNRA manages two scientific Stations in Antarctica - Concordia (Dome C), jointly operated with the French Polar Institute "Paul Emile Victor", and Mario Zucchelli (Terra Nova Bay, Southern Victoria Land). In addition National Research Council of Italy (CNR) manages one scientific Station in the Arctic Circle (Ny-Alesund-Svalbard Islands), named Dirigibile Italia. PNRA started in 1985 with the first Italian Expedition in Antarctica. Since then each research group has collected data regarding biology and medicine, geodetic observatory, geophysics, geology, glaciology, physics and atmospheric chemistry, earth-sun relationships and astrophysics, oceanography and marine environment, chemistry contamination, law and geographic science, technology, multi and inter disciplinary researches, autonomously with different formats. In 2010 the Italian Ministry of Research assigned the scientific coordination of the Programme to CNR, which is in charge of the management and sharing of the scientific results carried out in the framework of the PNRA. Therefore, CNR is establishing a new distributed cyber(e)-infrastructure to collect, manage, publish and share polar research results. This is a service-based infrastructure building on Web technologies to implement resources (i.e. data, services and documents) discovery, access and visualization; in addition, semantic-enabled functionalities will be provided. The architecture applies the "System of Systems" principles to build incrementally on the existing systems by supplementing but not supplanting their mandates and governance arrangements. This allows to keep the existing capacities as autonomous as possible. This cyber(e)-infrastructure implements multi-disciplinary interoperability following a Brokering approach and supporting the relevant international standards recognized by European and international standards, including: GEO/GEOSS, INSPIRE and SCAR. The Brokering approach is empowered by a technology developed by CNR, advanced by the FP7 EuroGEOSS project, and recently adopted by the GEOSS Common Infrastructure (GCI).
Metadata for WIS and WIGOS: GAW Profile of ISO19115 and Draft WIGOS Core Metadata Standard
NASA Astrophysics Data System (ADS)
Klausen, Jörg; Howe, Brian
2014-05-01
The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority to underpin all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). The development of the WIGOS Operational Information Resource (WIR) is central to the WIGOS Framework Implementation Plan (WIGOS-IP). The WIR shall provide information on WIGOS and its observing components, as well as requirements of WMO application areas. An important aspect is the description of the observational capabilities by way of structured metadata. The Global Atmosphere Watch is the WMO program addressing the chemical composition and selected physical properties of the atmosphere. Observational data are collected and archived by GAW World Data Centres (WDCs) and related data centres. The Task Team on GAW WDCs (ET-WDC) have developed a profile of the ISO19115 metadata standard that is compliant with the WMO Information System (WIS) specification for the WMO Core Metadata Profile v1.3. This profile is intended to harmonize certain aspects of the documentation of observations as well as the interoperability of the WDCs. The Inter-Commission-Group on WIGOS (ICG-WIGOS) has established the Task Team on WIGOS Metadata (TT-WMD) with representation of all WMO Technical Commissions and the objective to define the WIGOS Core Metadata. The result of this effort is a draft semantic standard comprising of a set of metadata classes that are considered to be of critical importance for the interpretation of observations relevant to WIGOS. The purpose of the presentation is to acquaint the audience with the standard and to solicit informal feed-back from experts in the various disciplines of meteorology and climatology. This feed-back will help ET-WDC and TT-WMD to refine the GAW metadata profile and the draft WIGOS metadata standard, thereby increasing their utility and acceptance.
SIPSMetGen: It's Not Just For Aircraft Data and ECS Anymore.
NASA Astrophysics Data System (ADS)
Schwab, M.
2015-12-01
The SIPSMetGen utility, developed for the NASA EOSDIS project, under the EED contract, simplified the creation of file level metadata for the ECS System. The utility has been enhanced for ease of use, efficiency, speed and increased flexibility. The SIPSMetGen utility was originally created as a means of generating file level spatial metadata for Operation IceBridge. The first version created only ODL metadata, specific for ingest into ECS. The core strength of the utility was, and continues to be, its ability to take complex shapes and patterns of data collection point clouds from aircraft flights and simplify them to a relatively simple concave hull geo-polygon. It has been found to be a useful and easy to use tool for creating file level metadata for many other missions, both aircraft and satellite. While the original version was useful it had its limitations. In 2014 Raytheon was tasked to make enhancements to SIPSMetGen, this resulted a new version of SIPSMetGen which can create ISO Compliant XML metadata; provides optimization and streamlining of the algorithm for creating the spatial metadata; a quicker runtime with more consistent results; a utility that can be configured to run multi-threaded on systems with multiple processors. The utility comes with a java based graphical user interface to aid in configuration and running of the utility. The enhanced SIPSMetGen allows more diverse data sets to be archived with file level metadata. The advantage of archiving data with file level metadata is that it makes it easier for data users, and scientists to find relevant data. File level metadata unlocks the power of existing archives and metadata repositories such as ECS and CMR and search and discovery utilities like Reverb and Earth Data Search. Current missions now using SIPSMetGen include: Aquarius, Measures, ARISE, and Nimbus.
Integration experiences and performance studies of A COTS parallel archive systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Hsing-bung; Scott, Cody; Grider, Bary
2010-01-01
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Hsing-bung; Scott, Cody; Grider, Gary
2010-06-16
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less
Inheritance rules for Hierarchical Metadata Based on ISO 19115
NASA Astrophysics Data System (ADS)
Zabala, A.; Masó, J.; Pons, X.
2012-04-01
Mainly, ISO19115 has been used to describe metadata for datasets and services. Furthermore, ISO19115 standard (as well as the new draft ISO19115-1) includes a conceptual model that allows to describe metadata at different levels of granularity structured in hierarchical levels, both in aggregated resources such as particularly series, datasets, and also in more disaggregated resources such as types of entities (feature type), types of attributes (attribute type), entities (feature instances) and attributes (attribute instances). In theory, to apply a complete metadata structure to all hierarchical levels of metadata, from the whole series to an individual feature attributes, is possible, but to store all metadata at all levels is completely impractical. An inheritance mechanism is needed to store each metadata and quality information at the optimum hierarchical level and to allow an ease and efficient documentation of metadata in both an Earth observation scenario such as a multi-satellite mission multiband imagery, as well as in a complex vector topographical map that includes several feature types separated in layers (e.g. administrative limits, contour lines, edification polygons, road lines, etc). Moreover, and due to the traditional split of maps in tiles due to map handling at detailed scales or due to the satellite characteristics, each of the previous thematic layers (e.g. 1:5000 roads for a country) or band (Landsat-5 TM cover of the Earth) are tiled on several parts (sheets or scenes respectively). According to hierarchy in ISO 19115, the definition of general metadata can be supplemented by spatially specific metadata that, when required, either inherits or overrides the general case (G.1.3). Annex H of this standard states that only metadata exceptions are defined at lower levels, so it is not necessary to generate the full registry of metadata for each level but to link particular values to the general value that they inherit. Conceptually the metadata registry is complete for each metadata hierarchical level, but at the implementation level most of the metadata elements are not stored at both levels but only at more generic one. This communication defines a metadata system that covers 4 levels, describes which metadata has to support series-layer inheritance and in which way, and how hierarchical levels are defined and stored. Metadata elements are classified according to the type of inheritance between products, series, tiles and the datasets. It explains the metadata elements classification and exemplifies it using core metadata elements. The communication also presents a metadata viewer and edition tool that uses the described model to propagate metadata elements and to show to the user a complete set of metadata for each level in a transparent way. This tool is integrated in the MiraMon GIS software.
NASA Astrophysics Data System (ADS)
Jeffery, Keith; Bailo, Daniele
2014-05-01
The European Plate Observing System (EPOS) is integrating geoscientific information concerning earth movements in Europe. We are approaching the end of the PP (Preparatory Project) phase and in October 2014 expect to continue with the full project within ESFRI (European Strategic Framework for Research Infrastructures). The key aspects of EPOS concern providing services to allow homogeneous access by end-users over heterogeneous data, software, facilities, equipment and services. The e-infrastructure of EPOS is the heart of the project since it integrates the work on organisational, legal, economic and scientific aspects. Following the creation of an inventory of relevant organisations, persons, facilities, equipment, services, datasets and software (RIDE) the scale of integration required became apparent. The EPOS e-infrastructure architecture has been developed systematically based on recorded primary (user) requirements and secondary (interoperation with other systems) requirements through Strawman, Woodman and Ironman phases with the specification - and developed confirmatory prototypes - becoming more precise and progressively moving from paper to implemented system. The EPOS architecture is based on global core services (Integrated Core Services - ICS) which access thematic nodes (domain-specific European-wide collections, called thematic Core Services - TCS), national nodes and specific institutional nodes. The key aspect is the metadata catalog. In one dimension this is described in 3 levels: (1) discovery metadata using well-known and commonly used standards such as DC (Dublin Core) to enable users (via an intelligent user interface) to search for objects within the EPOS environment relevant to their needs; (2) contextual metadata providing the context of the object described in the catalog to enable a user or the system to determine the relevance of the discovered object(s) to their requirement - the context includes projects, funding, organisations involved, persons involved, related publications, facilities, equipment and others, and utilises CERIF (Common European Research Information Format) standard (see www.eurocris.org); (3) detailed metadata which is specific to a domain or to a particular object and includes the schema describing the object to processing software. The other dimension of the metadata concerns the objects described. These are classified into users, services (including software), data and resources (computing, data storage, instruments and scientific equipment). An alternative architecture has been considered: using brokering. This technique has been used especially in North America geoscience projects to interoperate datasets. The technique involves writing software to interconvert between any two node datasets. Given n nodes this implies writing n*(n-1) convertors. EPOS Working Group 7 (e-infrastructures and virtual community) which deals with the design and implementation of a prototype of the EPOS services, chose to use an approach which endows the system with an extreme flexibility and sustainability. It is called the Metadata Catalogue approach. With the use of the catalogue the EPOS system can: 1. interoperate with software, services, users, organisations, facilities, equipment etc. as well as datasets; 2. avoid to write n*(n-1) software convertors and generate as much as possible, through the information contained in the catalogue only n convertors. This is a huge saving - especially in maintenance as the datasets (or other node resources) evolve. We are working on (semi-) automation of convertor generation by metadata mapping - this is leading-edge computer science research; 3. make large use of contextual metadata which enable a user or a machine to: (i) improve discovery of resources at nodes; (ii) improve precision and recall in search; (iii) drive the systems for identification, authentication, authorisation, security and privacy recording the relevant attributes of the node resources and of the user; (iv) manage provenance and long-term digital preservation; The linkage between the Integrated Services, which provide the integration of data and services, with the diverse Thematic Services Nodes is provided by means of a compatibility layer, which includes the aforementioned metadata catalogue. This layer provides 'connectors' to make local data, software and services available through the EPOS Integrated Services layer. In conclusion, we believe the EPOS e-infrastructure architecture is fit for purpose including long-term sustainability and pan-European access to data and services.
Karst database development in Minnesota: Design and data assembly
Gao, Y.; Alexander, E.C.; Tipping, R.G.
2005-01-01
The Karst Feature Database (KFD) of Minnesota is a relational GIS-based Database Management System (DBMS). Previous karst feature datasets used inconsistent attributes to describe karst features in different areas of Minnesota. Existing metadata were modified and standardized to represent a comprehensive metadata for all the karst features in Minnesota. Microsoft Access 2000 and ArcView 3.2 were used to develop this working database. Existing county and sub-county karst feature datasets have been assembled into the KFD, which is capable of visualizing and analyzing the entire data set. By November 17 2002, 11,682 karst features were stored in the KFD of Minnesota. Data tables are stored in a Microsoft Access 2000 DBMS and linked to corresponding ArcView applications. The current KFD of Minnesota has been moved from a Windows NT server to a Windows 2000 Citrix server accessible to researchers and planners through networked interfaces. ?? Springer-Verlag 2005.
A Reference Architecture for Space Information Management
NASA Technical Reports Server (NTRS)
Mattmann, Chris A.; Crichton, Daniel J.; Hughes, J. Steven; Ramirez, Paul M.; Berrios, Daniel C.
2006-01-01
We describe a reference architecture for space information management systems that elegantly overcomes the rigid design of common information systems in many domains. The reference architecture consists of a set of flexible, reusable, independent models and software components that function in unison, but remain separately managed entities. The main guiding principle of the reference architecture is to separate the various models of information (e.g., data, metadata, etc.) from implemented system code, allowing each to evolve independently. System modularity, systems interoperability, and dynamic evolution of information system components are the primary benefits of the design of the architecture. The architecture requires the use of information models that are substantially more advanced than those used by the vast majority of information systems. These models are more expressive and can be more easily modularized, distributed and maintained than simpler models e.g., configuration files and data dictionaries. Our current work focuses on formalizing the architecture within a CCSDS Green Book and evaluating the architecture within the context of the C3I initiative.
Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.
Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin
2018-02-01
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.
The Climate-G testbed: towards a large scale data sharing environment for climate change
NASA Astrophysics Data System (ADS)
Aloisio, G.; Fiore, S.; Denvil, S.; Petitdidier, M.; Fox, P.; Schwichtenberg, H.; Blower, J.; Barbera, R.
2009-04-01
The Climate-G testbed provides an experimental large scale data environment for climate change addressing challenging data and metadata management issues. The main scope of Climate-G is to allow scientists to carry out geographical and cross-institutional climate data discovery, access, visualization and sharing. Climate-G is a multidisciplinary collaboration involving both climate and computer scientists and it currently involves several partners such as: Centro Euro-Mediterraneo per i Cambiamenti Climatici (CMCC), Institut Pierre-Simon Laplace (IPSL), Fraunhofer Institut für Algorithmen und Wissenschaftliches Rechnen (SCAI), National Center for Atmospheric Research (NCAR), University of Reading, University of Catania and University of Salento. To perform distributed metadata search and discovery, we adopted a CMCC metadata solution (which provides a high level of scalability, transparency, fault tolerance and autonomy) leveraging both on P2P and grid technologies (GRelC Data Access and Integration Service). Moreover, data are available through OPeNDAP/THREDDS services, Live Access Server as well as the OGC compliant Web Map Service and they can be downloaded, visualized, accessed into the proposed environment through the Climate-G Data Distribution Centre (DDC), the web gateway to the Climate-G digital library. The DDC is a data-grid portal allowing users to easily, securely and transparently perform search/discovery, metadata management, data access, data visualization, etc. Godiva2 (integrated into the DDC) displays 2D maps (and animations) and also exports maps for display on the Google Earth virtual globe. Presently, Climate-G publishes (through the DDC) about 2TB of data related to the ENSEMBLES project (also including distributed replicas of data) as well as to the IPCC AR4. The main results of the proposed work are: wide data access/sharing environment for climate change; P2P/grid metadata approach; production-level Climate-G DDC; high quality tools for data visualization; metadata search/discovery across several countries/institutions; open environment for climate change data sharing.
NASA Astrophysics Data System (ADS)
Lanckman, Jean-Pierre; Elger, Kirsten; Karlsson, Ævar Karl; Johannsson, Halldór; Lantuit, Hugues
2013-04-01
Permafrost is a direct indicator of climate change and has been identified as Essential Climate Variable (ECV) by the global observing community. The monitoring of permafrost temperatures, active-layer thicknesses and other parameters has been performed for several decades already, but it was brought together within the Global Terrestrial Network for Permafrost (GTN-P) in the 1990's only, including the development of measurement protocols to provide standardized data. GTN-P is the primary international observing network for permafrost sponsored by the Global Climate Observing System (GCOS) and the Global Terrestrial Observing System (GTOS), and managed by the International Permafrost Association (IPA). All GTN-P data was outfitted with an "open data policy" with free data access via the World Wide Web. The existing data, however, is far from being homogeneous: it is not yet optimized for databases, there is no framework for data reporting or archival and data documentation is incomplete. As a result, and despite the utmost relevance of permafrost in the Earth's climate system, the data has not been used by as many researchers as intended by the initiators of the programs. While the monitoring of many other ECVs has been tackled by organized international networks (e.g. FLUXNET), there is still no central database for all permafrost-related parameters. The European Union project PAGE21 created opportunities to develop this central database for permafrost monitoring parameters of GTN-P during the duration of the project and beyond. The database aims to be the one location where the researcher can find data, metadata, and information of all relevant parameters for a specific site. Each component of the Data Management System (DMS), including parameters, data levels and metadata formats were developed in cooperation with the GTN-P and the IPA. The general framework of the GTN-P DMS is based on an object oriented model (OOM), open for as many parameters as possible, and implemented into a spatial database. To ensure interoperability and enable potential inter-database search, field names are following international metadata standards and are based on a control vocabulary registry. Tools are developed to provide data processing, analysis capability, and quality control. Our system aims to be a reference model, improvable and reusable. It allows a maximum top-down and bottom-up data flow, giving scientists one global searchable data and metadata repository, the public a full access to scientific data, and the policy maker a powerful cartographic and statistical tool. To engage the international community in GTN-P, it was essential to develop an online interface for data upload. Aim for this was that it is easy-to-use and allows data input with a minimum of technical and personal effort. In addition to this, large efforts will have to be produced in order to be able to query, visualize and retrieve information over many platforms and type of measurements. Ultimately, it is not the layer in itself that matter, but more the relationship that these information layers maintain with each other.
The Arctic Observing Network (AON)Cooperative Arctic Data and Information Service (CADIS)
NASA Astrophysics Data System (ADS)
Moore, J.; Fetterer, F.; Middleton, D.; Ramamurthy, M.; Barry, R.
2007-12-01
The Arctic Observing Network (AON) is intended to be a federation of 34 land, atmosphere and ocean observation sites, some already operating and some newly funded by the U.S. National Science Foundation. This International Polar Year (IPY) initiative will acquire a major portion of the data coming from the interagency Study of Environmental Arctic Change (SEARCH). AON will succeed in supporting the science envisioned by its planners only if it functions as a system and not as a collection of independent observation programs. Development and implementation of a comprehensive data management strategy will key a key to the success of this effort. AON planners envision an ideal data management system that includes a portal through which scientists can submit metadata and datasets at a single location; search the complete archive and find all data relevant to a location or process; all data have browse imagery and complete documentation; time series or fields can be plotted on line, and all data are in a relational database so that multiple data sets and sources can be queried and retrieved. The Cooperative Arctic Data and Information Service (CADIS) will provide near-real-time data delivery, a long-term repository for data, a portal for data discovery, and tools to manipulate data by building on existing tools like the Unidata Integrated Data Viewer (IDV). Our approach to the data integration challenge is to start by asking investigators to provide metadata via a general purpose user interface. An entry tool assists PIs in writing metadata and submitting data. Data can be submitted to the archive in NetCDF with Climate and Forecast conventions or in one of several other standard formats where possible. CADIS is a joint effort of the University Corporation for Atmospheric Research (UCAR), the National Snow and Ice Data Center (NSIDC), and the National Center for Atmospheric Research (NCAR). In the first year, we are concentrating on establishing metadata protocols that are compatible with international standards, and on demonstrating data submission, search and visualization tools with a subset of AON data. These capabilities will be expanded in years 2 and 3. By working with AON investigators and by using evolving conventions for in situ data formats as they mature, we hope to bring CADIS to the full level of data integration imagined by AON planners. The CADIS development will be described in terms of challenges, implementation strategies and progress to date. The developers are making a conscious effort to integrate this system and its data holdings with the complementary efforts in the SEARCH and IPY programs. The interdisciplinary content of the data, the variations in format and documentation, as well as its geographic coverage across the Arctic Basin all impact the form and effectiveness of the CADIS system architecture. The clever solutions to the complexity of implementing a comprehensive data management strategy implied in this diversity will be a focus of the presentation.
JAMSTEC DARWIN Database Assimilates GANSEKI and COEDO
NASA Astrophysics Data System (ADS)
Tomiyama, T.; Toyoda, Y.; Horikawa, H.; Sasaki, T.; Fukuda, K.; Hase, H.; Saito, H.
2017-12-01
Introduction: Japan Agency for Marine-Earth Science and Technology (JAMSTEC) archives data and samples obtained by JAMSTEC research vessels and submersibles. As a common property of the human society, JAMSTEC archive is open for public users with scientific/educational purposes [1]. For publicizing its data and samples online, JAMSTEC is operating NUUNKUI data sites [2], a group of several databases for various data and sample types. For years, data and metadata of JAMSTEC rock samples, sediment core samples and cruise/dive observation were publicized through databases named GANSEKI, COEDO, and DARWIN, respectively. However, because they had different user interfaces and data structures, these services were somewhat confusing for unfamiliar users. Maintenance costs of multiple hardware and software were also problematic for performing sustainable services and continuous improvements. Database Integration: In 2017, GANSEKI, COEDO and DARWIN were integrated into DARWIN+ [3]. The update also included implementation of map-search function as a substitute of closed portal site. Major functions of previous systems were incorporated into the new system; users can perform the complex search, by thumbnail browsing, map area, keyword filtering, and metadata constraints. As for data handling, the new system is more flexible, allowing the entry of variety of additional data types. Data Management: After the DARWIN major update, JAMSTEC data & sample team has been dealing with minor issues of individual sample data/metadata which sometimes need manual modification to be transferred to the new system. Some new data sets, such as onboard sample photos and surface close-up photos of rock samples, are getting available online. Geochemical data of sediment core samples will supposedly be added in the near future. Reference: [1] http://www.jamstec.go.jp/e/database/data_policy.html [2] http://www.godac.jamstec.go.jp/jmedia/portal/e/ [3] http://www.godac.jamstec.go.jp/darwin/e/
Design and Application of an Ontology for Component-Based Modeling of Water Systems
NASA Astrophysics Data System (ADS)
Elag, M.; Goodall, J. L.
2012-12-01
Many Earth system modeling frameworks have adopted an approach of componentizing models so that a large model can be assembled by linking a set of smaller model components. These model components can then be more easily reused, extended, and maintained by a large group of model developers and end users. While there has been a notable increase in component-based model frameworks in the Earth sciences in recent years, there has been less work on creating framework-agnostic metadata and ontologies for model components. Well defined model component metadata is needed, however, to facilitate sharing, reuse, and interoperability both within and across Earth system modeling frameworks. To address this need, we have designed an ontology for the water resources community named the Water Resources Component (WRC) ontology in order to advance the application of component-based modeling frameworks across water related disciplines. Here we present the design of the WRC ontology and demonstrate its application for integration of model components used in watershed management. First we show how the watershed modeling system Soil and Water Assessment Tool (SWAT) can be decomposed into a set of hydrological and ecological components that adopt the Open Modeling Interface (OpenMI) standard. Then we show how the components can be used to estimate nitrogen losses from land to surface water for the Baltimore Ecosystem study area. Results of this work are (i) a demonstration of how the WRC ontology advances the conceptual integration between components of water related disciplines by handling the semantic and syntactic heterogeneity present when describing components from different disciplines and (ii) an investigation of a methodology by which large models can be decomposed into a set of model components that can be well described by populating metadata according to the WRC ontology.
NASA Astrophysics Data System (ADS)
Schweitzer, R. H.
2001-05-01
The Climate Diagnostics Center maintains a collection of gridded climate data primarily for use by local researchers. Because this data is available on fast digital storage and because it has been converted to netCDF using a standard metadata convention (called COARDS), we recognize that this data collection is also useful to the community at large. At CDC we try to use technology and metadata standards to reduce our costs associated with making these data available to the public. The World Wide Web has been an excellent technology platform for meeting that goal. Specifically we have developed Web-based user interfaces that allow users to search, plot and download subsets from the data collection. We have also been exploring use of the Pacific Marine Environment Laboratory's Live Access Server (LAS) as an engine for this task. This would result in further savings by allowing us to concentrate on customizing the LAS where needed, rather that developing and maintaining our own system. One such customization currently under development is the use of Java Servlets and JavaServer pages in conjunction with a metadata database to produce a hierarchical user interface to LAS. In addition to these Web-based user interfaces all of our data are available via the Distributed Oceanographic Data System (DODS). This allows other sites using LAS and individuals using DODS-enabled clients to use our data as if it were a local file. All of these technology systems are driven by metadata. When we began to create netCDF files, we collaborated with several other agencies to develop a netCDF convention (COARDS) for metadata. At CDC we have extended that convention to incorporate additional metadata elements to make the netCDF files as self-describing as possible. Part of the local metadata is a set of controlled names for the variable, level in the atmosphere and ocean, statistic and data set for each netCDF file. To allow searching and easy reorganization of these metadata, we loaded the metadata from the netCDF files into a mySQL database. The combination of the mySQL database and the controlled names makes it possible to automate the construction of user interfaces and standard format metadata descriptions, like Federal Geographic Data Committee (FGDC) and Directory Interchange Format (DIF). These standard descriptions also include an association between our controlled names and standard keywords such as those developed by the Global Change Master Directory (GCMD). This talk will give an overview of each of these technology and metadata standards as it applies to work at the Climate Diagnostics Center. The talk will also discuss the pros and cons of each approach and discuss areas for future development.
NASA Astrophysics Data System (ADS)
Seamon, E.; Gessler, P. E.; Flathers, E.; Sheneman, L.; Gollberg, G.
2013-12-01
The Regional Approaches to Climate Change for Pacific Northwest Agriculture (REACCH PNA) is a five-year USDA/NIFA-funded coordinated agriculture project to examine the sustainability of cereal crop production systems in the Pacific Northwest, in relationship to ongoing climate change. As part of this effort, an extensive data management system has been developed to enable researchers, students, and the public, to upload, manage, and analyze various data. The REACCH PNA data management team has developed three core systems to encompass cyberinfrastructure and data management needs: 1) the reacchpna.org portal (https://www.reacchpna.org) is the entry point for all public and secure information, with secure access by REACCH PNA members for data analysis, uploading, and informational review; 2) the REACCH PNA Data Repository is a replicated, redundant database server environment that allows for file and database storage and access to all core data; and 3) the REACCH PNA Libraries which are functional groupings of data for REACCH PNA members and the public, based on their access level. These libraries are accessible thru our https://www.reacchpna.org portal. The developed system is structured in a virtual server environment (data, applications, web) that includes a geospatial database/geospatial web server for web mapping services (ArcGIS Server), use of ESRI's Geoportal Server for data discovery and metadata management (under the ISO 19115-2 standard), Thematic Realtime Environmental Distributed Data Services (THREDDS) for data cataloging, and Interactive Python notebook server (IPython) technology for data analysis. REACCH systems are housed and maintained by the Northwest Knowledge Network project (www.northwestknowledge.net), which provides data management services to support research. Initial project data harvesting and meta-tagging efforts have resulted in the interrogation and loading of over 10 terabytes of climate model output, regional entomological data, agricultural and atmospheric information, as well as imagery, publications, videos, and other soft content. In addition, the outlined data management approach has focused on the integration and interconnection of hard data (raw data output) with associated publications, presentations, or other narrative documentation - through metadata lineage associations. This harvest-and-consume data management methodology could additionally be applied to other research team environments that involve large and divergent data.
The French initiative for scientific cores virtual curating : a user-oriented integrated approach
NASA Astrophysics Data System (ADS)
Pignol, Cécile; Godinho, Elodie; Galabertier, Bruno; Caillo, Arnaud; Bernardet, Karim; Augustin, Laurent; Crouzet, Christian; Billy, Isabelle; Teste, Gregory; Moreno, Eva; Tosello, Vanessa; Crosta, Xavier; Chappellaz, Jérome; Calzas, Michel; Rousseau, Denis-Didier; Arnaud, Fabien
2016-04-01
Managing scientific data is probably one the most challenging issue in modern science. The question is made even more sensitive with the need of preserving and managing high value fragile geological sam-ples: cores. Large international scientific programs, such as IODP or ICDP are leading an intense effort to solve this problem and propose detailed high standard work- and dataflows thorough core handling and curating. However most results derived from rather small-scale research programs in which data and sample management is generally managed only locally - when it is … The national excellence equipment program (Equipex) CLIMCOR aims at developing French facilities for coring and drilling investigations. It concerns indiscriminately ice, marine and continental samples. As part of this initiative, we initiated a reflexion about core curating and associated coring-data management. The aim of the project is to conserve all metadata from fieldwork in an integrated cyber-environment which will evolve toward laboratory-acquired data storage in a near future. In that aim, our demarche was conducted through an close relationship with field operators as well laboratory core curators in order to propose user-oriented solutions. The national core curating initiative currently proposes a single web portal in which all scientifics teams can store their field data. For legacy samples, this will requires the establishment of a dedicated core lists with associated metadata. For forthcoming samples, we propose a mobile application, under Android environment to capture technical and scientific metadata on the field. This application is linked with a unique coring tools library and is adapted to most coring devices (gravity, drilling, percussion, etc...) including multiple sections and holes coring operations. Those field data can be uploaded automatically to the national portal, but also referenced through international standards or persistent identifiers (IGSN, ORCID and INSPIRE) and displayed in international portals (currently, NOAA's IMLGS). In this paper, we present the architecture of the integrated system, future perspectives and the approach we adopted to reach our goals. We will also present in front of our poster, one of the three mobile applications, dedicated more particularly to the operations of continental drillings.
Ensuring Credibility of NASA's Earth Science Data (Invited)
NASA Astrophysics Data System (ADS)
Maiden, M. E.; Ramapriyan, H. K.; Mitchell, A. E.; Berrick, S. W.; Walter, J.; Murphy, K. J.
2013-12-01
The summary description of the Fall 2013 AGU session on 'Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-Source Science' identifies four attributes needed to ensure credibility in Earth science data records. NASA's Earth Science Data Systems Program has been working on all four of these attributes: transparency, completeness, permanence, and ease of access and use, by focusing on them and upon improving our practices of them, over many years. As far as transparency or openness, NASA was in the forefront of free and open sharing of data and associated information for Earth observations. The US data policy requires such openness, but allows for the recoup of the marginal cost of distribution of government data and information - but making the data available with no such charge greatly increases their usage in scientific studies and the resultant analyses hasten our collective understanding of the Earth system. NASA's currently available Earth observations comprise primarily those obtained from satellite-borne instruments, suborbital campaigns, and field investigations. These data are complex and must be accompanied by rich metadata and documentation to be understandable. To enable completeness, NASA utilizes standards for data format, metadata content, and required documentation for any data that are ingested into our distributed Earth Observing System Data and Information System, or EOSDIS. NASA is moving to a new metadata paradigm, primarily to enable a fuller description of data quality and fit-for-purpose attributes. This paradigm offers structured approaches for storing quality measures in metadata that include elements such as Positional Accuracy, Lineage and Cloud Cover. NASA exercises validation processes for the Earth Science Data Systems Program to ensure users of EOSDIS have a predictable level of confidence in data as well as assessing the data viability for usage and application. The Earth Science Data Systems Program has been improving its data management practices for over twenty years to assure permanence of data utility through reliable preservation of bits, readability, understandability, usability and reproducibility of results. While NASA has focused on the Earth System Science research community as the primary data user community, broad interest in the data due to climate change and how it is affecting people everywhere (e.g. sea level rise) by environmental managers, public policymakers and citizen scientists has led the Program to respond with new tools and ways to improve ease of access and use of the data. NASA's standard Earth observation data will soon be buttressed with the long tail of federally-funded research data created or analyzed by grantees, in response to John Holdren's OSTP Memorandum to federal departments and agencies entitled 'Increasing Access to the Results of Federally-Funded Scientific Research'. We fully expect that NASA's Earth Science Data Systems Program will be able to work with our grantees to comply early, and flexibly improve the openness of this source of scientific data to a best practice for NASA and the grantees
NASA Astrophysics Data System (ADS)
Wood, Chris
2016-04-01
Under the Marine Strategy Framework Directive (MSFD), EU Member States are mandated to achieve or maintain 'Good Environmental Status' (GES) in their marine areas by 2020, through a series of Programme of Measures (PoMs). The Celtic Seas Partnership (CSP), an EU LIFE+ project, aims to support policy makers, special-interest groups, users of the marine environment, and other interested stakeholders on MSFD implementation in the Celtic Seas geographical area. As part of this support, a metadata portal has been built to provide a signposting service to datasets that are relevant to MSFD within the Celtic Seas. To ensure that the metadata has the widest possible reach, a linked data approach was employed to construct the database. Although the metadata are stored in a traditional RDBS, the metadata are exposed as linked data via the D2RQ platform, allowing virtual RDF graphs to be generated. SPARQL queries can be executed against the end-point allowing any user to manipulate the metadata. D2RQ's mapping language, based on turtle, was used to map a wide range of relevant ontologies to the metadata (e.g. The Provenance Ontology (prov-o), Ocean Data Ontology (odo), Dublin Core Elements and Terms (dc & dcterms), Friend of a Friend (foaf), and Geospatial ontologies (geo)) allowing users to browse the metadata, either via SPARQL queries or by using D2RQ's HTML interface. The metadata were further enhanced by mapping relevant parameters to the NERC Vocabulary Server, itself built on a SPARQL endpoint. Additionally, a custom web front-end was built to enable users to browse the metadata and express queries through an intuitive graphical user interface that requires no prior knowledge of SPARQL. As well as providing means to browse the data via MSFD-related parameters (Descriptor, Criteria, and Indicator), the metadata records include the dataset's country of origin, the list of organisations involved in the management of the data, and links to any relevant INSPIRE-compliant services relating to the dataset. The web front-end therefore enables users to effectively filter, sort, or search the metadata. As the MSFD timeline requires Member States to review their progress on achieving or maintaining GES every six years, the timely development of this metadata portal will not only aid interested stakeholders in understanding how member states are meeting their targets, but also shows how linked data can be used effectively to support policy makers and associated legislative bodies.
Multimedia content management in MPEG-21 framework
NASA Astrophysics Data System (ADS)
Smith, John R.
2002-07-01
MPEG-21 is an emerging standard from MPEG that specifies a framework for transactions of multimedia content. MPEG-21 defines the fundamental concept known as a digital item, which is the unit of transaction in the multimedia framework. A digital item can be used to package content for such as a digital photograph, a video clip or movie, a musical recording with graphics and liner notes, a photo album, and so on. The packaging of the media resources, corresponding identifiers, and associated metadata is provided in the declaration of the digital item. The digital item declaration allows for more effective transaction, distribution, and management of multimedia content and corresponding metadata, rights expressions, variations of media resources. In this paper, we describe various challenges for multimedia content management in the MPEG-21 framework.
OntoSoft: A Software Registry for Geosciences
NASA Astrophysics Data System (ADS)
Garijo, D.; Gil, Y.
2017-12-01
The goal of the EarthCube OntoSoft project is to enable the creation of an ecosystem for software stewardship in geosciences that will empower scientists to manage their software as valuable scientific assets. By sharing software metadata in OntoSoft, scientists enable broader access to that software by other scientists, software professionals, students, and decision makers. Our work to date includes: 1) an ontology for describing scientific software metadata, 2) a distributed scientific software repository that contains more than 750 entries that can be searched and compared across metadata fields, 3) an intelligent user interface that guides scientists to publish software and allows them to crowdsource its corresponding metadata. We have also developed a training program where scientists learn to describe and cite software in their papers in addition to data and provenance, and we are using OntoSoft to show them the benefits of publishing their software metadata. This training program is part of a Geoscience Papers of the Future Initiative, where scientists are reflecting on their current practices, benefits and effort for sharing software and data. This journal paper can be submitted to a Special Section of the AGU Earth and Space Science Journal.
Digital Curation of Earth Science Samples Starts in the Field
NASA Astrophysics Data System (ADS)
Lehnert, K. A.; Hsu, L.; Song, L.; Carter, M. R.
2014-12-01
Collection of physical samples in the field is an essential part of research in the Earth Sciences. Samples provide a basis for progress across many disciplines, from the study of global climate change now and over the Earth's history, to present and past biogeochemical cycles, to magmatic processes and mantle dynamics. The types of samples, methods of collection, and scope and scale of sampling campaigns are highly diverse, ranging from large-scale programs to drill rock and sediment cores on land, in lakes, and in the ocean, to environmental observation networks with continuous sampling, to single investigator or small team expeditions to remote areas around the globe or trips to local outcrops. Cyberinfrastructure for sample-related fieldwork needs to cater to the different needs of these diverse sampling activities, aligning with specific workflows, regional constraints such as connectivity or climate, and processing of samples. In general, digital tools should assist with capture and management of metadata about the sampling process (location, time, method) and the sample itself (type, dimension, context, images, etc.), management of the physical objects (e.g., sample labels with QR codes), and the seamless transfer of sample metadata to data systems and software relevant to the post-sampling data acquisition, data processing, and sample curation. In order to optimize CI capabilities for samples, tools and workflows need to adopt community-based standards and best practices for sample metadata, classification, identification and registration. This presentation will provide an overview and updates of several ongoing efforts that are relevant to the development of standards for digital sample management: the ODM2 project that has generated an information model for spatially-discrete, feature-based earth observations resulting from in-situ sensors and environmental samples, aligned with OGC's Observation & Measurements model (Horsburgh et al, AGU FM 2014); implementation of the IGSN (International Geo Sample Number) as a globally unique sample identifier via a distributed system of allocating agents and a central registry; and the EarthCube Research Coordination Network iSamplES (Internet of Samples in the Earth Sciences) that aims to improve sharing and curation of samples through the use of CI.
Image BOSS: a biomedical object storage system
NASA Astrophysics Data System (ADS)
Stacy, Mahlon C.; Augustine, Kurt E.; Robb, Richard A.
1997-05-01
Researchers using biomedical images have data management needs which are oriented perpendicular to clinical PACS. The image BOSS system is designed to permit researchers to organize and select images based on research topic, image metadata, and a thumbnail of the image. Image information is captured from existing images in a Unix based filesystem, stored in an object oriented database, and presented to the user in a familiar laboratory notebook metaphor. In addition, the ImageBOSS is designed to provide an extensible infrastructure for future content-based queries directly on the images.
NASA Astrophysics Data System (ADS)
Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.
2015-12-01
The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.
FITS and PDS4: Planetary Surface Data Interoperability Made Easier
NASA Astrophysics Data System (ADS)
Marmo, C.; Hare, T. M.; Erard, S.; Cecconi, B.; Minin, M.; Rossi, A. P.; Costard, F.; Schmidt, F.
2018-04-01
This abstract describes how Flexible Image Transport System (FITS) can be used in planetary surface investigations, and how its metadata can easily be inserted in the PDS4 metadata distribution model.
Pathogen metadata platform: software for accessing and analyzing pathogen strain information.
Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia
2016-09-15
Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
The open research system: a web-based metadata and data repository for collaborative research
Charles M. Schweik; Alexander Stepanov; J. Morgan Grove
2005-01-01
Beginning in 1999, a web-based metadata and data repository we call the "open research system" (ORS) was designed and built to assist geographically distributed scientific research teams. The purpose of this innovation was to promote the open sharing of data within and across organizational lines and across geographic distances. As the use of the system...
NASA Astrophysics Data System (ADS)
McInnes, B.; Brown, A.; Liffers, M.
2015-12-01
Publically funded laboratories have a responsibility to generate, archive and disseminate analytical data to the research community. Laboratory managers know however, that a long tail of analytical effort never escapes researchers' thumb drives once they leave the lab. This work reports on a research data management project (Digital Mineralogy Library) where integrated hardware and software systems automatically archive and deliver analytical data and metadata to institutional and community data portals. The scientific objective of the DML project was to quantify the modal abundance of heavy minerals extracted from key lithological units in Western Australia. The selected analytical platform was a TESCAN Integrated Mineral Analyser (TIMA) that uses EDS-based mineral classification software to image and quantify mineral abundance and grain size at micron scale resolution. The analytical workflow used a bespoke laboratory information management system (LIMS) to orchestrate: (1) the preparation of grain mounts with embedded QR codes that serve as enduring links between physical samples and analytical data, (2) the assignment of an International Geo Sample Number (IGSN) and Digital Object Identifier (DOI) to each grain mount via the System for Earth Sample Registry (SESAR), (3) the assignment of a DOI to instrument metadata via Research Data Australia, (4) the delivery of TIMA analytical outputs, including spatially registered mineralogy images and mineral abundance data, to an institutionally-based data management server, and (5) the downstream delivery of a final data product via a Google Maps interface such as the AuScope Discovery Portal. The modular design of the system permits the networking of multiple instruments within a single site or multiple collaborating research institutions. Although sharing analytical data does provide new opportunities for the geochemistry community, the creation of an open data network requires: (1) adopting open data reporting standards and conventions, (2) requiring instrument manufacturers and software developers to deliver and process data in formats compatible with open standards, and (3) public funding agencies to incentivise researchers, laboratories and institutions to make their data open and accessible to consumers.
Ridge 2000 Data Management System
NASA Astrophysics Data System (ADS)
Goodwillie, A. M.; Carbotte, S. M.; Arko, R. A.; Haxby, W. F.; Ryan, W. B.; Chayes, D. N.; Lehnert, K. A.; Shank, T. M.
2005-12-01
Hosted at Lamont by the marine geoscience Data Management group, mgDMS, the NSF-funded Ridge 2000 electronic database, http://www.marine-geo.org/ridge2000/, is a key component of the Ridge 2000 multi-disciplinary program. The database covers each of the three Ridge 2000 Integrated Study Sites: Endeavour Segment, Lau Basin, and 8-11N Segment. It promotes the sharing of information to the broader community, facilitates integration of the suite of information collected at each study site, and enables comparisons between sites. The Ridge 2000 data system provides easy web access to a relational database that is built around a catalogue of cruise metadata. Any web browser can be used to perform a versatile text-based search which returns basic cruise and submersible dive information, sample and data inventories, navigation, and other relevant metadata such as shipboard personnel and links to NSF program awards. In addition, non-proprietary data files, images, and derived products which are hosted locally or in national repositories, as well as science and technical reports, can be freely downloaded. On the Ridge 2000 database page, our Data Link allows users to search the database using a broad range of parameters including data type, cruise ID, chief scientist, geographical location. The first Ridge 2000 field programs sailed in 2004 and, in addition to numerous data sets collected prior to the Ridge 2000 program, the database currently contains information on fifteen Ridge 2000-funded cruises and almost sixty Alvin dives. Track lines can be viewed using a recently- implemented Web Map Service button labelled Map View. The Ridge 2000 database is fully integrated with databases hosted by the mgDMS group for MARGINS and the Antarctic multibeam and seismic reflection data initiatives. Links are provided to partner databases including PetDB, SIOExplorer, and the ODP Janus system. Improved inter-operability with existing and new partner repositories continues to be strengthened. One major effort involves the gradual unification of the metadata across these partner databases. Standardised electronic metadata forms that can be filled in at sea are available from our web site. Interactive map-based exploration and visualisation of the Ridge 2000 database is provided by GeoMapApp, a freely-available Java(tm) application being developed within the mgDMS group. GeoMapApp includes high-resolution bathymetric grids for the 8-11N EPR segment and allows customised maps and grids for any of the Ridge 2000 ISS to be created. Vent and instrument locations can be plotted and saved as images, and Alvin dive photos are also available.
NASA Astrophysics Data System (ADS)
Hills, S. J.; Richard, S. M.; Doniger, A.; Danko, D. M.; Derenthal, L.; Energistics Metadata Work Group
2011-12-01
A diverse group of organizations representative of the international community involved in disciplines relevant to the upstream petroleum industry, - energy companies, - suppliers and publishers of information to the energy industry, - vendors of software applications used by the industry, - partner government and academic organizations, has engaged in the Energy Industry Metadata Standards Initiative. This Initiative envisions the use of standard metadata within the community to enable significant improvements in the efficiency with which users discover, evaluate, and access distributed information resources. The metadata standard needed to realize this vision is the initiative's primary deliverable. In addition to developing the metadata standard, the initiative is promoting its adoption to accelerate realization of the vision, and publishing metadata exemplars conformant with the standard. Implementation of the standard by community members, in the form of published metadata which document the information resources each organization manages, will allow use of tools requiring consistent metadata for efficient discovery and evaluation of, and access to, information resources. While metadata are expected to be widely accessible, access to associated information resources may be more constrained. The initiative is being conducting by Energistics' Metadata Work Group, in collaboration with the USGIN Project. Energistics is a global standards group in the oil and natural gas industry. The Work Group determined early in the initiative, based on input solicited from 40+ organizations and on an assessment of existing metadata standards, to develop the target metadata standard as a profile of a revised version of ISO 19115, formally the "Energy Industry Profile of ISO/DIS 19115-1 v1.0" (EIP). The Work Group is participating on the ISO/TC 211 project team responsible for the revision of ISO 19115, now ready for "Draft International Standard" (DIS) status. With ISO 19115 an established, capability-rich, open standard for geographic metadata, EIP v1 is expected to be widely acceptable within the community and readily sustainable over the long-term. The EIP design, also per community requirements, will enable discovery, evaluation, and access to types of information resources considered important to the community, including structured and unstructured digital resources, and physical assets such as hardcopy documents and material samples. This presentation will briefly review the development of this initiative as well as the current and planned Work Group activities. More time will be spent providing an overview of the EIP v1, including the requirements it prescribes, design efforts made to enable automated metadata capture and processing, and the structure and content of its documentation, which was written to minimize ambiguity and facilitate implementation. The Work Group considers EIP v1 a solid initial design for interoperable metadata, and first step toward the vision of the Initiative.
Semantic Entity Pairing for Improved Data Validation and Discovery
NASA Astrophysics Data System (ADS)
Shepherd, Adam; Chandler, Cyndy; Arko, Robert; Chen, Yanning; Krisnadhi, Adila; Hitzler, Pascal; Narock, Tom; Groman, Robert; Rauch, Shannon
2014-05-01
One of the central incentives for linked data implementations is the opportunity to leverage the rich logic inherent in structured data. The logic embedded in semantic models can strengthen capabilities for data discovery and data validation when pairing entities from distinct, contextually-related datasets. The creation of links between the two datasets broadens data discovery by using the semantic logic to help machines compare similar entities and properties that exist on different levels of granularity. This semantic capability enables appropriate entity pairing without making inaccurate assertions as to the nature of the relationship. Entity pairing also provides a context to accurately validate the correctness of an entity's property values - an exercise highly valued by data management practices who seek to ensure the quality and correctness of their data. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) semantically models metadata surrounding oceanographic researchcruises, but other sources outside of BCO-DMO exist that also model metadata about these same cruises. For BCO-DMO, the process of successfully pairing its entities to these sources begins by selecting sources that are decidedly trustworthy and authoritative for the modeled concepts. In this case, the Rolling Deck to Repository (R2R) program has a well-respected reputation among the oceanographic research community, presents a data context that is uniquely different and valuable, and semantically models its cruise metadata. Where BCO-DMO exposes the processed, analyzed data products generated by researchers, R2R exposes the raw shipboard data that was collected on the same research cruises. Interlinking these cruise entities expands data discovery capabilities but also allows for validating the contextual correctness of both BCO-DMO's and R2R's cruise metadata. Assessing the potential for a link between two datasets for a similar entity consists of aligning like properties and deciding on the appropriate semantic markup to describe the link. This highlights the desire for research organizations like BCO-DMO and R2R to ensure the complete accuracy of their exposed metadata, as it directly reflects on their reputations as successful and trustworthy source of research data. Therefore, data validation reaches beyond simple syntax of property values into contextual correctness. As a human process, this is a time-intensive task that does not scale well for finite human and funding resources. Therefore, to assess contextual correctness across datasets at different levels of granularity, BCO-DMO is developing a system that employs semantic technologies to aid the human process by organizing potential links and calculating a confidence coefficient as to the correctness of the potential pairing based on the distance between certain entity property values. The system allows humans to quickly scan potential links and their confidence coefficients for asserting persistence and correcting and investigating misaligned entity property values.
CoMetaR: A Collaborative Metadata Repository for Biomedical Research Networks.
Stöhr, Mark R; Helm, Gudrun; Majeed, Raphael W; Günther, Andreas
2017-01-01
The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. To perform consortium-wide queries through one single interface, it requires a uniform conceptual structure. No single terminology covers all our concepts. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management "CoMetaR". Anyone can browse and discuss the ontology while editing can be performed by authenticated users.
ERIC Educational Resources Information Center
Blanchi, Christophe; Petrone, Jason; Pinfield, Stephen; Suleman, Hussein; Fox, Edward A.; Bauer, Charly; Roddy, Carol Lynn
2001-01-01
Includes four articles that discuss a distributed architecture for managing metadata that promotes interoperability between digital libraries; the use of electronic print (e-print) by physicists; the development of digital libraries; and a collaborative project between two library consortia in Ohio to provide digital versions of Sanborn Fire…
Constructing compact and effective graphs for recommender systems via node and edge aggregations
Lee, Sangkeun; Kahng, Minsuk; Lee, Sang-goo
2014-12-10
Exploiting graphs for recommender systems has great potential to flexibly incorporate heterogeneous information for producing better recommendation results. As our baseline approach, we first introduce a naive graph-based recommendation method, which operates with a heterogeneous log-metadata graph constructed from user log and content metadata databases. Although the na ve graph-based recommendation method is simple, it allows us to take advantages of heterogeneous information and shows promising flexibility and recommendation accuracy. However, it often leads to extensive processing time due to the sheer size of the graphs constructed from entire user log and content metadata databases. In this paper, we proposemore » node and edge aggregation approaches to constructing compact and e ective graphs called Factor-Item bipartite graphs by aggregating nodes and edges of a log-metadata graph. Furthermore, experimental results using real world datasets indicate that our approach can significantly reduce the size of graphs exploited for recommender systems without sacrificing the recommendation quality.« less
Automated software system for checking the structure and format of ACM SIG documents
NASA Astrophysics Data System (ADS)
Mirza, Arsalan Rahman; Sah, Melike
2017-04-01
Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.
DataUp: Helping manage and archive data within the researcher's workflow
NASA Astrophysics Data System (ADS)
Strasser, C.
2012-12-01
There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are lacks of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. We have developed an open-source add-in for Excel and an open source web application intended to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. The researcher does not need a prior relationship with a data repository to use DataUp; the newly implemented ONEShare repository, a DataONE member node, is available for any researcher to archive and share their data. By meeting researchers where they already work, in spreadsheets, DataUp becomes part of the researcher's workflow and data management and sharing becomes easier. Future enhancement of DataUp will rely on members of the community adopting and adapting the DataUp tools to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between Microsoft Research Connections, the University of California's California Digital Library, the Gordon and Betty Moore Foundation, and DataONE.
Operational Support for Instrument Stability through ODI-PPA Metadata Visualization and Analysis
NASA Astrophysics Data System (ADS)
Young, M. D.; Hayashi, S.; Gopu, A.; Kotulla, R.; Harbeck, D.; Liu, W.
2015-09-01
Over long time scales, quality assurance metrics taken from calibration and calibrated data products can aid observatory operations in quantifying the performance and stability of the instrument, and identify potential areas of concern or guide troubleshooting and engineering efforts. Such methods traditionally require manual SQL entries, assuming the requisite metadata has even been ingested into a database. With the ODI-PPA system, QA metadata has been harvested and indexed for all data products produced over the life of the instrument. In this paper we will describe how, utilizing the industry standard Highcharts Javascript charting package with a customized AngularJS-driven user interface, we have made the process of visualizing the long-term behavior of these QA metadata simple and easily replicated. Operators can easily craft a custom query using the powerful and flexible ODI-PPA search interface and visualize the associated metadata in a variety of ways. These customized visualizations can be bookmarked, shared, or embedded externally, and will be dynamically updated as new data products enter the system, enabling operators to monitor the long-term health of their instrument with ease.
A Spatial Data Infrastructure to Share Earth and Space Science Data
NASA Astrophysics Data System (ADS)
Nativi, S.; Mazzetti, P.; Bigagli, L.; Cuomo, V.
2006-05-01
Spatial Data Infrastructure:SDI (also known as Geospatial Data Infrastructure) is fundamentally a mechanism to facilitate the sharing and exchange of geospatial data. SDI is a scheme necessary for the effective collection, management, access, delivery and utilization of geospatial data; it is important for: objective decision making and sound land based policy, support economic development and encourage socially and environmentally sustainable development. As far as data model and semantics are concerned, a valuable and effective SDI should be able to cross the boundaries between the Geographic Information System/Science (GIS) and Earth and Space Science (ESS) communities. Hence, SDI should be able to discover, access and share information and data produced and managed by both GIS and ESS communities, in an integrated way. In other terms, SDI must be built on a conceptual and technological framework which abstracts the nature and structure of shared dataset: feature-based data or Imagery, Gridded and Coverage Data (IGCD). ISO TC211 and the Open Geospatial Consortium provided important artifacts to build up this framework. In particular, the OGC Web Services (OWS) initiatives and several Interoperability Experiment (e.g. the GALEON IE) are extremely useful for this purpose. We present a SDI solution which is able to manage both GIS and ESS datasets. It is based on OWS and other well-accepted or promising technologies, such as: UNIDATA netCDF and CDM, ncML and ncML-GML. Moreover, it uses a specific technology to implement a distributed and federated system of catalogues: the GI-Cat. This technology performs data model mediation and protocol adaptation tasks. It is used to work out a metadata clearinghouse service, implementing a common (federal) catalogue model which is based on the ISO 19115 core metadata for geo-dataset. Nevertheless, other well- accepted or standard catalogue data models can be easily implemented as common view (e.g. OGC CS-W, the next coming INSPIRE discovery metadata model, etc.). The proposed solution has been conceived and developed for building up the "Lucan SDI". This is the SDI of the Italian Basilicata Region. It aims to connect the following data providers and users: the National River Basin Authority of Basilicata, the Regional Environmental Agency, the Land Management & Cadastre Regional Authorities, the Prefecture, the Regional Civil Protection Centers, the National Research Council Institutes in Basilicata, the Academia, several SMEs.
Oceans 2.0: a Data Management Infrastructure as a Platform
NASA Astrophysics Data System (ADS)
Pirenne, B.; Guillemot, E.
2012-04-01
Oceans 2.0: a Data Management Infrastructure as a Platform Benoît Pirenne, Associate Director, IT, NEPTUNE Canada Eric Guillemot, Manager, Software Development, NEPTUNE Canada The Data Management and Archiving System (DMAS) serving the needs of a number of undersea observing networks such as VENUS and NEPTUNE Canada was conceived from the beginning as a Service-Oriented Infrastructure. Its core functional elements (data acquisition, transport, archiving, retrieval and processing) can interact with the outside world using Web Services. Those Web Services can be exploited by a variety of higher level applications. Over the years, DMAS has developed Oceans 2.0: an environment where these techniques are implemented. The environment thereby becomes a platform in that it allows for easy addition of new and advanced features that build upon the tools at the core of the system. The applications that have been developed include: data search and retrieval, including options such as data product generation, data decimation or averaging, etc. dynamic infrastructure description (search all observatory metadata) and visualization data visualization, including dynamic scalar data plots, integrated fast video segment search and viewing Building upon these basic applications are new concepts, coming from the Web 2.0 world that DMAS has added: They allow people equipped only with a web browser to collaborate and contribute their findings or work results to the wider community. Examples include: addition of metadata tags to any part of the infrastructure or to any data item (annotations) ability to edit and execute, share and distribute Matlab code on-line, from a simple web browser, with specific calls within the code to access data ability to interactively and graphically build pipeline processing jobs that can be executed on the cloud web-based, interactive instrument control tools that allow users to truly share the use of the instruments and communicate with each other and last but not least: a public tool in the form of a game, that crowd-sources the inventory of the underwater video archive content, thereby adding tremendous amounts of metadata Beyond those tools that represent the functionality presently available to users, a number of the Web Services dedicated to data access are being exposed for anyone to use. This allows not only for ad hoc data access by individuals who need non-interactive access, but will foster the development of new applications in a variety of areas.
Globus Online: Climate Data Management for Small Teams
NASA Astrophysics Data System (ADS)
Ananthakrishnan, R.; Foster, I.
2013-12-01
Large and highly distributed climate data demands new approaches to data organization and lifecycle management. We need, in particular, catalogs that can allow researchers to track the location and properties of large numbers of data files, and management tools that can allow researchers to update data properties and organization during their research, move data among different locations, and invoke analysis computations on data--all as easily as if they were working with small numbers of files on their desktop computer. Both catalogs and management tools often need to be able to scale to extremely large quantities of data. When developing solutions to these problems, it is important to distinguish between the needs of (a) large communities, for whom the ability to organize published data is crucial (e.g., by implementing formal data publication processes, assigning DOIs, recording definitive metadata, providing for versioning), and (b) individual researchers and small teams, who are more frequently concerned with tracking the diverse data and computations involved in what highly dynamic and iterative research processes. Key requirements in the latter case include automated data registration and metadata extraction, ease of update, close-to-zero management overheads (e.g., no local software install); and flexible, user-managed sharing support, allowing read and write privileges within small groups. We describe here how new capabilities provided by the Globus Online system address the needs of the latter group of climate scientists, providing for the rapid creation and establishment of lightweight individual- or team-specific catalogs; the definition of logical groupings of data elements, called datasets; the evolution of catalogs, dataset definitions, and associated metadata over time, to track changes in data properties and organization as a result of research processes; and the manipulation of data referenced by catalog entries (e.g., replication of a dataset to a remote location for analysis, sharing of a dataset). Its software-as-a-service ('SaaS') architecture means that these capabilities are provided to users over the network, without a need for local software installation. In addition, Globus Online provides well defined APIs, thus providing a platform that can be leveraged to integrate the capabilities with other portals and applications. We describe early applications of these new Globus Online to climate science. We focus in particular on applications that demonstrate how Globus Online capabilities complement those of the Earth System Grid Federation (ESGF), the premier system for publication and discovery of large community datasets. ESGF already uses Globus Online mechanisms for data download. We demonstrate methods by which the two systems can be further integrated and harmonized, so that for example data collections produced within a small team can be easily published from Globus Online to ESGF for archival storage and broader access--and a Globus Online catalog can be used to organize an individual view of a subset of data held in ESGF.
A Transparently-Scalable Metadata Service for the Ursa Minor Storage System
2010-06-25
provide application-level guarantees. For example, many document editing programs imple- ment atomic updates by writing the new document ver- sion into a...Transparently-Scalable Metadata Service for the Ursa Minor Storage System 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...operations that could involve multiple servers, how close existing systems come to transparent scala - bility, how systems that handle multi-server
NASA Astrophysics Data System (ADS)
Richard, S. M.
2011-12-01
The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata content. The use cases for the detailed content must be well understood, and the degree of metadata complexity should be determined by requirements for those use cases. The ISO standard provides sufficient flexibility that relatively simple metadata records can be created that will serve for text-indexed search/discovery, resource evaluation by a user reading text content from the metadata, and access to the resource via http, ftp, or well-known service protocols (e.g. Thredds; OGC WMS, WFS, WCS).
NASA Astrophysics Data System (ADS)
Meertens, Charles; Boler, Fran; Miller, M. Meghan
2015-04-01
UNAVCO community investigators are actively engaged in using space and terrestrial geodetic techniques to study earthquake processes, mantle properties, active magmatic systems, plate tectonics, plate boundary zone deformation, intraplate deformation, glacial isostatic adjustment, and hydrologic and atmospheric processes. The first GPS field projects were conducted over thirty years ago, and from the beginning these science investigations and the UNAVCO constituency as a whole have been international and collaborative in scope and participation. Collaborations were driven by the nature of the scientific problems being addressed, the capability of the technology to make precise measurements over global scales, and inherent technical necessity for sharing of GPS tracking data across national boundaries. The International GNSS Service (IGS) was formed twenty years ago as a voluntary federation to share GPS data from now hundreds of locations around the globe to facilitate realization of global reference frames, ties to regional surveys, precise orbits, and to establish and improve best practices in analysis and infrastructure. Recently, however, numbers of regional stations have grown to the tens of thousands, often with data that are difficult to access. UNAVCO has been working to help remove technical barriers by providing open source tools such as the Geodetic Seamless Archive Centers software to facilitate cross-project data sharing and discovery and by developing Dataworks software to manage network data. Data web services also provide the framework for UNAVCO contributions to multi-technique, inter-disciplinary, and integrative activities such as CoopEUS, GEO Supersites, EarthScope, and EarthCube. Within the geodetic community, metadata standards and data exchange formats have been developed and evolved collaboratively through the efforts of global organizations such as the IGS. A new generation of metadata and data exchange formats, as well as the software tools that utilize these formats and that support more efficient exchange of the highest quality data and metadata, are currently being developed and deployed through multiple international efforts.
NASA Astrophysics Data System (ADS)
Albeke, S. E.; Perkins, D. G.; Ewers, S. L.; Ewers, B. E.; Holbrook, W. S.; Miller, S. N.
2015-12-01
The sharing of data and results is paramount for advancing scientific research. The Wyoming Center for Environmental Hydrology and Geophysics (WyCEHG) is a multidisciplinary group that is driving scientific breakthroughs to help manage water resources in the Western United States. WyCEHG is mandated by the National Science Foundation (NSF) to share their data. However, the infrastructure from which to share such diverse, complex and massive amounts of data did not exist within the University of Wyoming. We developed an innovative framework to meet the data organization, sharing, and discovery requirements of WyCEHG by integrating both open and closed source software, embedded metadata tags, semantic web technologies, and a web-mapping application. The infrastructure uses a Relational Database Management System as the foundation, providing a versatile platform to store, organize, and query myriad datasets, taking advantage of both structured and unstructured formats. Detailed metadata are fundamental to the utility of datasets. We tag data with Uniform Resource Identifiers (URI's) to specify concepts with formal descriptions (i.e. semantic ontologies), thus allowing users the ability to search metadata based on the intended context rather than conventional keyword searches. Additionally, WyCEHG data are geographically referenced. Using the ArcGIS API for Javascript, we developed a web mapping application leveraging database-linked spatial data services, providing a means to visualize and spatially query available data in an intuitive map environment. Using server-side scripting (PHP), the mapping application, in conjunction with semantic search modules, dynamically communicates with the database and file system, providing access to available datasets. Our approach provides a flexible, comprehensive infrastructure from which to store and serve WyCEHG's highly diverse research-based data. This framework has not only allowed WyCEHG to meet its data stewardship requirements, but can provide a template for others to follow.
Evolution of the use of relational and NoSQL databases in the ATLAS experiment
NASA Astrophysics Data System (ADS)
Barberis, D.
2016-09-01
The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
Web Monitoring of EOS Front-End Ground Operations, Science Downlinks and Level 0 Processing
NASA Technical Reports Server (NTRS)
Cordier, Guy R.; Wilkinson, Chris; McLemore, Bruce
2008-01-01
This paper addresses the efforts undertaken and the technology deployed to aggregate and distribute the metadata characterizing the real-time operations associated with NASA Earth Observing Systems (EOS) high-rate front-end systems and the science data collected at multiple ground stations and forwarded to the Goddard Space Flight Center for level 0 processing. Station operators, mission project management personnel, spacecraft flight operations personnel and data end-users for various EOS missions can retrieve the information at any time from any location having access to the internet. The users are distributed and the EOS systems are distributed but the centralized metadata accessed via an external web server provide an effective global and detailed view of the enterprise-wide events as they are happening. The data-driven architecture and the implementation of applied middleware technology, open source database, open source monitoring tools, and external web server converge nicely to fulfill the various needs of the enterprise. The timeliness and content of the information provided are key to making timely and correct decisions which reduce project risk and enhance overall customer satisfaction. The authors discuss security measures employed to limit access of data to authorized users only.
Park, Yu Rang; Yoon, Young Jo; Kim, Hye Hyeon; Kim, Ju Han
2013-01-01
Achieving semantic interoperability is critical for biomedical data sharing between individuals, organizations and systems. The ISO/IEC 11179 MetaData Registry (MDR) standard has been recognized as one of the solutions for this purpose. The standard model, however, is limited. Representing concepts consist of two or more values, for instance, are not allowed including blood pressure with systolic and diastolic values. We addressed the structural limitations of ISO/IEC 11179 by an integrated metadata object model in our previous research. In the present study, we introduce semantic extensions for the model by defining three new types of semantic relationships; dependency, composite and variable relationships. To evaluate our extensions in a real world setting, we measured the efficiency of metadata reduction by means of mapping to existing others. We extracted metadata from the College of American Pathologist Cancer Protocols and then evaluated our extensions. With no semantic loss, one third of the extracted metadata could be successfully eliminated, suggesting better strategy for implementing clinical MDRs with improved efficiency and utility.
Bruland, Philipp; Doods, Justin; Storck, Michael; Dugas, Martin
2017-01-01
Data dictionaries provide structural meta-information about data definitions in health information technology (HIT) systems. In this regard, reusing healthcare data for secondary purposes offers several advantages (e.g. reduce documentation times or increased data quality). Prerequisites for data reuse are its quality, availability and identical meaning of data. In diverse projects, research data warehouses serve as core components between heterogeneous clinical databases and various research applications. Given the complexity (high number of data elements) and dynamics (regular updates) of electronic health record (EHR) data structures, we propose a clinical metadata warehouse (CMDW) based on a metadata registry standard. Metadata of two large hospitals were automatically inserted into two CMDWs containing 16,230 forms and 310,519 data elements. Automatic updates of metadata are possible as well as semantic annotations. A CMDW allows metadata discovery, data quality assessment and similarity analyses. Common data models for distributed research networks can be established based on similarity analyses.
MyGeoHub: A Collaborative Geospatial Research and Education Platform
NASA Astrophysics Data System (ADS)
Kalyanam, R.; Zhao, L.; Biehl, L. L.; Song, C. X.; Merwade, V.; Villoria, N.
2017-12-01
Scientific research is increasingly collaborative and globally distributed; research groups now rely on web-based scientific tools and data management systems to simplify their day-to-day collaborative workflows. However, such tools often lack seamless interfaces, requiring researchers to contend with manual data transfers, annotation and sharing. MyGeoHub is a web platform that supports out-of-the-box, seamless workflows involving data ingestion, metadata extraction, analysis, sharing and publication. MyGeoHub is built on the HUBzero cyberinfrastructure platform and adds general-purpose software building blocks (GABBs), for geospatial data management, visualization and analysis. A data management building block iData, processes geospatial files, extracting metadata for keyword and map-based search while enabling quick previews. iData is pervasive, allowing access through a web interface, scientific tools on MyGeoHub or even mobile field devices via a data service API. GABBs includes a Python map library as well as map widgets that in a few lines of code, generate complete geospatial visualization web interfaces for scientific tools. GABBs also includes powerful tools that can be used with no programming effort. The GeoBuilder tool provides an intuitive wizard for importing multi-variable, geo-located time series data (typical of sensor readings, GPS trackers) to build visualizations supporting data filtering and plotting. MyGeoHub has been used in tutorials at scientific conferences and educational activities for K-12 students. MyGeoHub is also constantly evolving; the recent addition of Jupyter and R Shiny notebook environments enable reproducible, richly interactive geospatial analyses and applications ranging from simple pre-processing to published tools. MyGeoHub is not a monolithic geospatial science gateway, instead it supports diverse needs ranging from just a feature-rich data management system, to complex scientific tools and workflows.
Comprehensive Optimal Manpower and Personnel Analytic Simulation System (COMPASS)
2009-10-01
4 The EDB consists of 4 major components (some of which are re-usable): 1. Metadata Editor ( MDE ): Also considered a leaf node, the metadata...end-user queries via the QB. The EDB supports multiple instances of the MDE , although currently, only a single instance is recommended. 2 Query...the MSB is a central collection of web services, responsible for the authentication and authorization of users, maintenance of the EDB metadata
2009-06-01
capabilities: web-based, relational/multi-dimensional, client/server, and metadata (data about data) inclusion (pp. 39-40). Text mining, on the other...and Organizational Systems ( CASOS ) (Carley, 2005). Although AutoMap can be used to conduct text-mining, it was utilized only for its visualization...provides insight into how the GMCOI is using the terms, and where there might be redundant terms and need for de -confliction and standardization
Designing for Peta-Scale in the LSST Database
NASA Astrophysics Data System (ADS)
Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.
2007-10-01
The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.
NASA Astrophysics Data System (ADS)
Schaap, Dick M. A.; Fichaut, Michele
2013-04-01
The second phase of the project SeaDataNet started on October 2011 for another 4 years with the aim to upgrade the SeaDataNet infrastructure built during previous years. The numbers of the project are quite impressive: 59 institutions from 35 different countries are involved. In particular, 45 data centers are sharing human and financial resources in a common efforts to sustain an operationally robust and state-of-the-art Pan-European infrastructure for providing up-to-date and high quality access to ocean and marine metadata, data and data products. The main objective of SeaDataNet II is to improve operations and to progress towards an efficient data management infrastructure able to handle the diversity and large volume of data collected via the Pan-European oceanographic fleet and the new observation systems, both in real-time and delayed mode. The infrastructure is based on a semi-distributed system that incorporates and enhance the existing NODCs network. SeaDataNet aims at serving users from science, environmental management, policy making, and economical sectors. Better integrated data systems are vital for these users to achieve improved scientific research and results, to support marine environmental and integrated coastal zone management, to establish indicators of Good Environmental Status for sea basins, and to support offshore industry developments, shipping, fisheries, and other economic activities. The recent EU communication "MARINE KNOWLEDGE 2020 - marine data and observation for smart and sustainable growth" states that the creation of marine knowledge begins with observation of the seas and oceans. In addition, directives, policies, science programmes require reporting of the state of the seas and oceans in an integrated pan-European manner: of particular note are INSPIRE, MSFD, WISE-Marine and GMES Marine Core Service. These underpin the importance of a well functioning marine and ocean data management infrastructure. SeaDataNet is now one of the major players in informatics in oceanography and collaborative relationships have been created with other EU and non EU projects. In particular SeaDataNet has recognised roles in the continuous serving of common vocabularies, the provision of tools for data management, as well as giving access to metadata, data sets and data products of importance for society. The SeaDataNet infrastructure comprises a network of interconnected data centres and a central SeaDataNet portal. The portal provides users not only background information about SeaDataNet and the various SeaDataNet standards and tools, but also a unified and transparent overview of the metadata and controlled access to the large collections of data sets, managed by the interconnected data centres. The presentation will give information on present services of the SeaDataNet infrastructure and services, and highlight a number of key achievements in SeaDataNet II so far.
Dinov, Ivo D.; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H. V.; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D. Stott; Toga, Arthur W.
2008-01-01
The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu. PMID:18509477
Framework for Managing Metadata Security Tags as the Basis for Making Security Decisions.
2002-12-01
and Performance,” D.H. Associates, Inc., Sep 2001. [3] Deitel , H. M., and Deitel , P. J., Java How to Program , 3rd Edition, Prentice Hall Inc...1999. [4] Deitel , H. M., Deitel , P. J., and Nieto, T. R., Internet and The World Wide Web: How to Program , 2nd Edition, 2002. [5] Grohn, M. J., A...words) This thesis presents an analysis of a capability to employ CAPCO (Controlled Access Program Coordination Office) compliant Metadata security
Facilitating Stewardship of scientific data through standards based workflows
NASA Astrophysics Data System (ADS)
Bastrakova, I.; Kemp, C.; Potter, A. K.
2013-12-01
There are main suites of standards that can be used to define the fundamental scientific methodology of data, methods and results. These are firstly Metadata standards to enable discovery of the data (ISO 19115), secondly the Sensor Web Enablement (SWE) suite of standards that include the O&M and SensorML standards and thirdly Ontology that provide vocabularies to define the scientific concepts and relationships between these concepts. All three types of standards have to be utilised by the practicing scientist to ensure that those who ultimately have to steward the data stewards to ensure that the data can be preserved curated and reused and repurposed. Additional benefits of this approach include transparency of scientific processes from the data acquisition to creation of scientific concepts and models, and provision of context to inform data use. Collecting and recording metadata is the first step in scientific data flow. The primary role of metadata is to provide details of geographic extent, availability and high-level description of data suitable for its initial discovery through common search engines. The SWE suite provides standardised patterns to describe observations and measurements taken for these data, capture detailed information about observation or analytical methods, used instruments and define quality determinations. This information standardises browsing capability over discrete data types. The standardised patterns of the SWE standards simplify aggregation of observation and measurement data enabling scientists to transfer disintegrated data to scientific concepts. The first two steps provide a necessary basis for the reasoning about concepts of ';pure' science, building relationship between concepts of different domains (linked-data), and identifying domain classification and vocabularies. Geoscience Australia is re-examining its marine data flows, including metadata requirements and business processes, to achieve a clearer link between scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.
Applications of the LBA-ECO Metadata Warehouse
NASA Astrophysics Data System (ADS)
Wilcox, L.; Morrell, A.; Griffith, P. C.
2006-05-01
The LBA-ECO Project Office has developed a system to harvest and warehouse metadata resulting from the Large-Scale Biosphere Atmosphere Experiment in Amazonia. The harvested metadata is used to create dynamically generated reports, available at www.lbaeco.org, which facilitate access to LBA-ECO datasets. The reports are generated for specific controlled vocabulary terms (such as an investigation team or a geospatial region), and are cross-linked with one another via these terms. This approach creates a rich contextual framework enabling researchers to find datasets relevant to their research. It maximizes data discovery by association and provides a greater understanding of the scientific and social context of each dataset. For example, our website provides a profile (e.g. participants, abstract(s), study sites, and publications) for each LBA-ECO investigation. Linked from each profile is a list of associated registered dataset titles, each of which link to a dataset profile that describes the metadata in a user-friendly way. The dataset profiles are generated from the harvested metadata, and are cross-linked with associated reports via controlled vocabulary terms such as geospatial region. The region name appears on the dataset profile as a hyperlinked term. When researchers click on this link, they find a list of reports relevant to that region, including a list of dataset titles associated with that region. Each dataset title in this list is hyperlinked to its corresponding dataset profile. Moreover, each dataset profile contains hyperlinks to each associated data file at its home data repository and to publications that have used the dataset. We also use the harvested metadata in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link to datasets with questionable file formats, and dynamically generated region/site coordinate quality assurance reports. These applications are as important as those that facilitate access to information because they help ensure a high standard of quality for the information. This presentation will discuss reports currently in use, provide a technical overview of the system, and discuss plans to extend this system to harvest metadata resulting from the North American Carbon Program by drawing on datasets in many different formats, residing in many thematic data centers and also distributed among hundreds of investigators.
The Kiel data management infrastructure - arising from a generic data model
NASA Astrophysics Data System (ADS)
Fleischer, D.; Mehrtens, H.; Schirnick, C.; Springer, P.
2010-12-01
The Kiel Data Management Infrastructure (KDMI) started from a cooperation of three large-scale projects (SFB574, SFB754 and Cluster of Excellence The Future Ocean) and the Leibniz Institute of Marine Sciences (IFM-GEOMAR). The common strategy for project data management is a single person collecting and transforming data according to the requirements of the targeted data center(s). The intention of the KDMI cooperation is to avoid redundant and potentially incompatible data management efforts for scientists and data managers and to create a single sustainable infrastructure. An increased level of complexity in the conceptual planing arose from the diversity of marine disciplines and approximately 1000 scientists involved. KDMI key features focus on the data provenance which we consider to comprise the entire workflow from field sampling thru labwork to data calculation and evaluation. Managing the data of each individual project participant in this way yields the data management for the entire project and warrants the reusability of (meta)data. Accordingly scientists provide a workflow definition of their data creation procedures resulting in their target variables. The central idea in the development of the KDMI presented here is based on the object oriented programming concept which allows to have one object definition (workflow) and infinite numbers of object instances (data). Each definition is created by a graphical user interface and produces XML output stored in a database using a generic data model. On creation of a data instance the KDMI translates the definition into web forms for the scientist, the generic data model then accepts all information input following the given data provenance definition. An important aspect of the implementation phase is the possibility of a successive transition from daily measurement routines resulting in single spreadsheet files with well known points of failure and limited reuseability to a central infrastructure as a single point of truth. The data provenance approach has the following positive side effects: (1) the scientist designs the extend and timing of data and metadata prompts by workflow definitions himself while (2) consistency and completeness (mandatory information) of metadata in the resulting XML document can be checked by XML validation. (3) Storage of the entire data creation process (including raw data and processing steps) provides a multidimensional quality history accessible by all researchers in addition to the commonly applied one dimensional quality flag system. (4) The KDMI can be extended to other scientific disciplines by adding new workflows and domain specific outputs assisted by the KDMI-Team. The KDMI is a social network inspired system but instead of sharing privacy it is a sharing platform for daily scientific work, data and their provenance.
ABSTRACTUser’s Guide & Metadata to Coastal Biodiversity Risk Analysis Tool (CBRAT): Framework for the Systemization of Life History and Biogeographic Information(EPA/601/B-15/001, 2015, 123 pages)Henry Lee II, U.S. EPA, Western Ecology DivisionKatharine Marko, U.S. EPA,...
Kuchinke, W; Wiegelmann, S; Verplancke, P; Ohmann, C
2006-01-01
Our objectives were to analyze the possibility of an exchange of an entire clinical study between two different and independent study software solutions. The question addressed was whether a software-independent transfer of study metadata can be performed without programming efforts and with software routinely used for clinical research. Study metadata was transferred with ODM standard (CDISC). Study software systems employed were MACRO (InferMed) and XTrial (XClinical). For the Proof of Concept, a test study was created with MACRO and exported as ODM. For modification and validation of the ODM export file XML-Spy (Altova) and ODM-Checker (XML4Pharma) were used. Through exchange of a complete clinical study between two different study software solutions, a Proof of Concept of the technical feasibility of a system-independent metadata exchange was conducted successfully. The interchange of study metadata between two different systems at different centers was performed with minimal expenditure. A small number of mistakes had to be corrected in order to generate a syntactically correct ODM file and a "vendor extension" had to be inserted. After these modifications, XTrial exhibited the study, including all data fields, correctly. However, the optical appearance of both CRFs (case report forms) was different. ODM can be used as an exchange format for clinical studies between different study software. Thus, new forms of cooperation through exchange of metadata seem possible, for example the joint creation of electronic study protocols or CRFs at different research centers. Although the ODM standard represents a clinical study completely, it contains no information about the representation of data fields in CRFs.
Serving Fisheries and Ocean Metadata to Communities Around the World
NASA Technical Reports Server (NTRS)
Meaux, Melanie
2006-01-01
NASA's Global Change Master Directory (GCMD) assists the oceanographic community in the discovery, access, and sharing of scientific data by serving on-line fisheries and ocean metadata to users around the globe. As of January 2006, the directory holds more than 16,300 Earth Science data descriptions and over 1,300 services descriptions. Of these, nearly 4,000 unique ocean-related metadata records are available to the public, with many having direct links to the data. In 2005, the GCMD averaged over 5 million hits a month, with nearly a half million unique hosts for the year. Through the GCMD portal (http://qcrnd.nasa.qov/), users can search vast and growing quantities of data and services using controlled keywords, free-text searches or a combination of both. Users may now refine a search based on topic, location, instrument, platform, project, data center, spatial and temporal coverage. The directory also offers data holders a means to post and search their data through customized portals, i.e. online customized subset metadata directories. The discovery metadata standard used is the Directory Interchange Format (DIF), adopted in 1994. This format has evolved to accommodate other national and international standards such as FGDC and IS019115. Users can submit metadata through easy-to-use online and offline authoring tools. The directory, which also serves as a coordinating node of the International Directory Network (IDN), has been active at the international, regional and national level for many years through its involvement with the Committee on Earth Observation Satellites (CEOS), federal agencies (such as NASA, NOAA, and USGS), international agencies (such as IOC/IODE, UN, and JAXA) and partnerships (such as ESIP, IOOS/DMAC, GOSIC, GLOBEC, OBIS, and GoMODP), sharing experience, knowledge related to metadata and/or data management and interoperability.
NASA Astrophysics Data System (ADS)
Gries, C.; Winslow, L.; Shin, P.; Hanson, P. C.; Barseghian, D.
2010-12-01
At the North Temperate Lakes Long Term Ecological Research (NTL LTER) site six buoys and one met station are maintained, each equipped with up to 20 sensors producing up to 45 separate data streams at a 1 or 10 minute frequency. Traditionally, this data volume has been managed in many matrix type tables, each described in the Ecological Metadata Language (EML) and accessed online by a query system based on the provided metadata. To develop a more flexible information system, several technologies are currently being experimented with. We will review, compare and evaluate these technologies and discuss constraints and advantages of network memberships and implementation of standards. A Data Turbine server is employed to stream data from data logger files into a database with the Real-time Data Viewer being used for monitoring sensor health. The Kepler work flow processor is being explored to introduce quality control routines into this data stream taking advantage of the Data Turbine actor. Kepler could replace traditional database triggers while adding visualization and advanced data access functionality for downstream modeling or other analytical applications. The data are currently streamed into the traditional matrix type tables and into an Observation Data Model (ODM) following the CUAHSI ODM 1.1 specifications. In parallel these sensor data are managed within the Global Lake Ecological Observatory Network (GLEON) where the software package Ziggy streams the data into a database of the VEGA data model. Contributing data to a network implies compliance with established standards for data delivery and data documentation. ODM or VEGA type data models are not easily described in EML, the metadata exchange standard for LTER sites, but are providing many advantages from an archival standpoint. Both GLEON and CUAHSI have developed advanced data access capabilities based on their respective data models and data exchange standards while LTER is currently in a phase of intense technology developments which will eventually provide standardized data access that includes ecological data set types currently not covered by either ODM or VEGA.
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.
Hitz, Benjamin C; Rowe, Laurence D; Podduturi, Nikhil R; Glick, David I; Baymuradov, Ulugbek K; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Narayana, Aditi K; Onate, Kathrina C; Hilton, Jason; Ho, Marcus C; Lee, Brian T; Miyasato, Stuart R; Dreszer, Timothy R; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest Y; Hong, Eurie L; Cherry, J Michael
2017-01-01
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata
Podduturi, Nikhil R.; Glick, David I.; Baymuradov, Ulugbek K.; Malladi, Venkat S.; Chan, Esther T.; Davidson, Jean M.; Gabdank, Idan; Narayana, Aditi K.; Onate, Kathrina C.; Hilton, Jason; Ho, Marcus C.; Lee, Brian T.; Miyasato, Stuart R.; Dreszer, Timothy R.; Sloan, Cricket A.; Strattan, J. Seth; Tanaka, Forrest Y.; Hong, Eurie L.; Cherry, J. Michael
2017-01-01
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package. PMID:28403240
The National Extreme Events Data and Research Center (NEED)
NASA Astrophysics Data System (ADS)
Gulledge, J.; Kaiser, D. P.; Wilbanks, T. J.; Boden, T.; Devarakonda, R.
2014-12-01
The Climate Change Science Institute at Oak Ridge National Laboratory (ORNL) is establishing the National Extreme Events Data and Research Center (NEED), with the goal of transforming how the United States studies and prepares for extreme weather events in the context of a changing climate. NEED will encourage the myriad, distributed extreme events research communities to move toward the adoption of common practices and will develop a new database compiling global historical data on weather- and climate-related extreme events (e.g., heat waves, droughts, hurricanes, etc.) and related information about impacts, costs, recovery, and available research. Currently, extreme event information is not easy to access and is largely incompatible and inconsistent across web sites. NEED's database development will take into account differences in time frames, spatial scales, treatments of uncertainty, and other parameters and variables, and leverage informatics tools developed at ORNL (i.e., the Metadata Editor [1] and Mercury [2]) to generate standardized, robust documentation for each database along with a web-searchable catalog. In addition, NEED will facilitate convergence on commonly accepted definitions and standards for extreme events data and will enable integrated analyses of coupled threats, such as hurricanes/sea-level rise/flooding and droughts/wildfires. Our goal and vision is that NEED will become the premiere integrated resource for the general study of extreme events. References: [1] Devarakonda, Ranjeet, et al. "OME: Tool for generating and managing metadata to handle BigData." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014. [2] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94.
ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond
NASA Astrophysics Data System (ADS)
van Gemmeren, P.; Cranshaw, J.; Malon, D.; Vaniachine, A.
2015-12-01
ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs. This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework's state machine transitions. This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework. At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony. The increased use of scatter-gather architectures, both local and distributed, requires further enhancement of metadata infrastructure in order to ensure semantic coherence and robust bookkeeping. This paper describes the evolution of ATLAS metadata infrastructure for Run 2 and beyond, including the transition to dual-use tools—tools that can operate inside or outside the ATLAS control framework—and the implications thereof. It further examines how the design of this infrastructure is changing to accommodate the requirements of future frameworks and emerging event processing architectures.
The IGSN Experience: Successes and Challenges of Implementing Persistent Identifiers for Samples
NASA Astrophysics Data System (ADS)
Lehnert, Kerstin; Arko, Robert
2016-04-01
Physical samples collected and studied in the Earth sciences represent both a research resource and a research product in the Earth Sciences. As such they need to be properly managed, curated, documented, and cited to ensure re-usability and utility for future science, reproducibility of the data generated by their study, and credit for funding agencies and researchers who invested substantial resources and intellectual effort into their collection and curation. Use of persistent and unique identifiers and deposition of metadata in a persistent registry are therefore as important for physical samples as they are for digital data. The International Geo Sample Number (IGSN) is a persistent, globally unique identifier. Its adoption by individual investigators, repository curators, publishers, and data managers is rapidly growing world-wide. This presentation will provide an analysis of the development and implementation path of the IGSN and relevant insights and experiences gained along its way. Development of the IGSN started in 2004 as part of a US NSF-funded project to establish a registry for sample metadata, the System for Earth Sample Registration (SESAR). The initial system provided a centralized solution for users to submit information about their samples and obtain IGSNs and bar codes. Challenges encountered during this initial phase related to defining the scope of the registry, granularity of registered objects, responsibilities of relevant actors, and workflows, and designing the registry's metadata schema, its user interfaces, and the identifier itself, including its syntax. The most challenging task though was to make the IGSN an integral part of personal and institutional sample management, digital management of sample-based data, and data publication on a global scale. Besides convincing individual researchers, curators, editors and publishers, as well as data managers in US and non-US academia, state and federal agencies, the PIs of the SESAR project needed to identify ways to organize, operate, and govern the global registry in the short and in the long-term. A major breakthrough was achieved at an international workshop in February 2011, at which participants designed a new distributed and scalable architecture for the IGSN with international governance by a membership organization modeled after the DataCite consortium. The founding of the international governing body and implementation organization for the IGSN, the IGSN e.V., took place at the AGU Fall Meeting 2011. Recent progress came at a workshop in September 2015, where stakeholders from both geoscience and life science disciplines drafted a standard IGSN metadata schema for describing samples, to complement the existing schema for registering samples. Consensus was achieved on an essential set of properties to describe a sample's origin and classification, creating a "birth certificate" for the sample. Further consensus was achieved in clarifying that an IGSN may represent exactly one physical sample, sampling feature, or collection of samples; and in aligning the IGSN schema with the existing Observations Data Model (ODM-2). The resulting schema was published online at schema.igsn.org and presented at the AGU Fall Meeting 2015.
Data publication activities in the Natural Environment Research Council
NASA Astrophysics Data System (ADS)
Leadbetter, A.; Callaghan, S.; Lowry, R.; Moncoiffé, G.; Donnegan, S.; Pepler, S.; Cunningham, N.; Kirsch, P.; Ault, L.; Bell, P.; Bowie, R.; Harrison, K.; Smith-Haddon, B.; Wetherby, A.; Wright, D.; Thorley, M.
2012-04-01
The Natural Environment Research Council (NERC) is implementing its Science Information Strategy in order to provide a world class service to deliver integrated data for earth system science. One project within this strategy is Data Citation and Publication, which aims to put the promotion and recognition stages of the data lifecycle into place alongside the traditional data management activities of NERC's Environmental Data Centres (EDCs). The NERC EDCs have made a distinction between the serving of data and its publication. Data serving is defined in this case as the day-to-day data management tasks of: • acquiring data and metadata from the originating scientists; • metadata and format harmonisation prior to database ingestion; • ensuring the metadata is adequate and accurate and that the data are available in appropriate file formats; • and making the data available for interested parties. Whereas publication: • requires the assignment of a digital object identifier to a dataset which guarantees that an EDC has assessed the quality of the metadata and the file format and will maintain an unchanged version of the data for the foreseeable future • requires the peer-review of the scientific quality of the data by a scientist with knowledge of the scientific domain in which the data were collected, using a framework for peer-review of datasets such as that developed by the CLADDIER project. • requires collaboration with journal publishers who have access to a well established peer-review system The first of these requirements can be managed in-house by the EDCs, while the remainder require collaboration with the wider scientific and publishing communities. It is anticipated that a scientist may achieve a lower level of academic credit for a dataset which is assigned a DOI but does not follow through to the scientific peer-review stage, similar to publication in a report or other non-peer reviewed publication normally described as grey literature, or in a conference proceedings. At the time of writing, the project has successfully assigned DOIs to more than ten legacy datasets held by EDCs through the British Library acting on behalf of the DataCite network. The project is in the process of developing guidelines for which datasets are suitable for submission to an EDC by a scientist wishing to receive a DOI for their data. While maintaining a United Kingdom focus, this project is not operating in isolation as its members are working alongside international groups such as the CODATA-ICSTI Task Group on Data Citations, the DataCite Working Group on Criteria for Datacentres, and the joint Scientific Commission for Oceanography / International Oceanographic Data and Information Exchange / Marine Biological Laboratory, Woods Hole Oceanographic Institution Library working group on data publication.
Data management routines for reproducible research using the G-Node Python Client library
Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J.; Garbers, Christian; Rautenberg, Philipp L.; Wachtler, Thomas
2014-01-01
Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow. PMID:24634654
Data management routines for reproducible research using the G-Node Python Client library.
Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J; Garbers, Christian; Rautenberg, Philipp L; Wachtler, Thomas
2014-01-01
Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow.